Dear Prof Ripley and Dimitris (cc R-list), thank you very much for your very insightful responses.
I've been checking how to use survreg{survival} to fit a left-censored lognormal distribution, and I was surprised to find that results are exactly the same as with fitdistr{MASS}. Here is an example with a larger dataset: #==================== # data: data1 <- data.frame(obs.time=c(30,31.98,35.95,37.2,46.4,50.5,54,56,60,62.95, 69.4,71.5,74,76,78,79.25,81.1,84.68,90,95.37,100), reps=c(5,47,80,18,20,32,29,8,29,2,7,8,1,4,3,2,3,3,1,2,1)) data2 <- with(data1, data.frame(obs.time=rep(obs.time, reps), event=rep(1,sum(data1$reps)))) # 1. using fitdistr to estimate lognormal parameters library(MASS) fit1 <- fitdistr(data2$obs.time, "lognormal"); fit1 # meanlog sdlog # 3.80552970 0.29093294 # (0.01665877) (0.01177953) # 2. using survreg for left-censored estimation library(survival) fm <- survreg(Surv(obs.time,event,type="left")~1, data=data2, dist="lognormal"); summary(fm) fm$coefficients[[1]]; fm$scale # [1] 3.80553 # [1] 0.2909329 # results are the identical !! # plotting ecdf and the fitted curve ========= # parameters fit1.mean <- fit1$estimate[[1]] fit1.sd <- fit1$estimate[[2]] plot(ecdf(data2$obs.time), verticals=TRUE, do.p=FALSE, lwd=1.5, xlim=c(0,116), ylab="pb", xlab="time") curve(plnorm(x, meanlog=fit1.mean, sdlog=fit1.sd), add=TRUE, col='red', lwd=3) # the fit looks good to me #============= end script ========================= I guess this relates to the following sentences from Prof Ripley: ML fitting can be done for censored data. However, I don't think you have a valid description here: it seems you never recorded a time at which the event had not happened, and the most likely fit is a probability mass at zero (since this is a perfect explanation for your data). (which I don't fully understand) and If you have an ECDF, the jumps give you the data so you can just use fitdistr(). I have the ECDF and the number of observations is ~150-300. My final understanding is therefore that fitdistr can be used to fit distributions by ML over left-censored data, which is equivalent to fit the distribution using cumulative probabilities (to go back to my original terminology). Is this understanding correct? I would highly appreciate any feedback, especially if I am misunderstanding the way to estimate the function parameters for this type of data. Thank you very much! Ahimsa On Sat, Feb 23, 2008 at 2:35 AM, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > On Sat, 23 Feb 2008, ahimsa campos-arceiz wrote: > > > Dear all, > > > > I'm trying to estimate the parameters of a lognormal distribution fitted > > from some data. > > > > The tricky thing is that my data represent the time at which I recorded > > certain events. However, in many cases I don't really know when the > event > > happened. I' only know the time at which I recorded it as already > happened. > > So this is a rather extreme form of censoring. > > > Therefore I want to fit the lognormal from the cumulative distribution > > function (cdf) rather than from the probability distribution function > (pdf). > > > > My understanding is that methods based on Maximum Likelihood (e.g. > fitdistr > > {MASS}) are based on the pdf. Nonlinear least-squares methods seem to be > > based on the cdf... however I was unable to use nls{stat} for lognormal. > > Not so: ML fitting can be done for censored data. However, I don't think > you have a valid description here: it seems you never recorded a time at > which the event had not happened, and the most likely fit is a probability > mass at zero (since this is a perfect explanation for your data). > > To make any progress with censoring, you need to see both positive and > negative events. If you told us that none of these events happened before > t=15, it would be possible to fit the model (although you would need far > more data to get a good fit). > > Generally code to handle censoring is in survival analysis: e.g. survreg() > in package survival. In the terminiology of the latter, all your > observations are left-censored. > > > I found a website that explains how to fit univariate distribution > functions > > based on cumulative probabilities, including a lognormal example, in > Matlab: > > > http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/cdffitdemo.html > > > > and other programs like TableCurve 2D seem to do this too. > > Maybe, but that is a different problem. If you have an ECDF, the jumps > give you the data so you can just use fitdistr(). (And you will see > comparing observed and fitted CDFs in MASS, the book.) > > > > There must be a straightforward method in R which I have overlooked. Any > > suggestion on how can I estimate these parameters in R or helpful > references > > are very much appreciated. > > > > (not sure if it helps but) here is an example of my type of data: > > > > treat.1 <- c(21.67, 21.67, 43.38, 35.50, 32.08, 32.08, 21.67, 21.67, > 41.33, > > 41.33, 41.33, 32.08, 21.67, 22.48, 23.25, 30.00, 26.00, 19.37, > 26.00 > > , > > 32.08, 21.67, 26.00, 26.00, 43.38, 26.00, 21.67, 22.48, 35.50, > 38.30, > > > > 32.08) > > > > treat.2 <- c(35.92, 12.08, 12.08, 30.00, 33.cy73, 35.92, 12.08, 30.00, > 56.00, > > 30.00, 35.92, 33.73, 12.08, 26.00, 54.00, 12.08, 12.08, 35.92, > 35.92 > > , > > 12.08, 33.73, 35.92, 63.20, 30.00, 26.00, 33.73, 23.50, 30.00, > 35.92 > > , > > 30.00) > > > > Thank you very much! > > > > Ahimsa > > > > > > -- > > ahimsa campos-arceiz > > www.camposarceiz.com > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, > http://www.stats.ox.ac.uk/~ripley/<http://www.stats.ox.ac.uk/%7Eripley/> > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > -- ahimsa campos-arceiz www.camposarceiz.com [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.