Re: [R] fitting a lognormal distribution using cumulative probabilities

ahimsa campos-arceiz Sat, 23 Feb 2008 01:07:45 -0800

Dear Prof Ripley and Dimitris (cc R-list),

thank you very much for your very insightful responses.


I've been checking how to use survreg{survival} to fit a left-censored
lognormal distribution, and I was surprised to find that results are exactly
the same as with fitdistr{MASS}. Here is an example with a larger dataset:

#====================
# data:
data1 <- data.frame(obs.time=c(30,31.98,35.95,37.2,46.4,50.5,54,56,60,62.95,
                  69.4,71.5,74,76,78,79.25,81.1,84.68,90,95.37,100),
                  reps=c(5,47,80,18,20,32,29,8,29,2,7,8,1,4,3,2,3,3,1,2,1))

data2 <- with(data1, data.frame(obs.time=rep(obs.time, reps),
                                event=rep(1,sum(data1$reps))))

# 1. using fitdistr to estimate lognormal parameters
library(MASS)
fit1 <- fitdistr(data2$obs.time, "lognormal"); fit1
#     meanlog       sdlog
#   3.80552970   0.29093294
#  (0.01665877) (0.01177953)

# 2. using survreg for left-censored estimation
library(survival)
fm <- survreg(Surv(obs.time,event,type="left")~1,
      data=data2, dist="lognormal"); summary(fm)
fm$coefficients[[1]]; fm$scale
# [1] 3.80553
# [1] 0.2909329
# results are the identical !!

# plotting ecdf and the fitted curve  =========
# parameters
fit1.mean <- fit1$estimate[[1]]
fit1.sd <-  fit1$estimate[[2]]

plot(ecdf(data2$obs.time), verticals=TRUE, do.p=FALSE, lwd=1.5,
    xlim=c(0,116), ylab="pb", xlab="time")
curve(plnorm(x, meanlog=fit1.mean, sdlog=fit1.sd), add=TRUE, col='red',
lwd=3)
 # the fit looks good to me

#============= end script =========================

I guess this relates to the following sentences from Prof Ripley:

ML fitting can be done for censored data.  However, I don't think
you have a valid description here: it seems you never recorded a time at
which the event had not happened, and the most likely fit is a probability
mass at zero (since this is a perfect explanation for your data).

(which I don't fully understand) and

If you have an ECDF, the jumps give you the data so you can just use
fitdistr().

I have the ECDF and the number of observations is ~150-300. My final
understanding is therefore that fitdistr can be used to fit distributions by
ML over left-censored data, which is equivalent to fit the distribution
using cumulative probabilities (to go back to my original terminology).

Is this understanding correct? I would highly appreciate any feedback,
especially if I am misunderstanding the way to estimate the function
parameters for this type of data.

Thank you very much!

Ahimsa




On Sat, Feb 23, 2008 at 2:35 AM, Prof Brian Ripley <[EMAIL PROTECTED]>
wrote:

> On Sat, 23 Feb 2008, ahimsa campos-arceiz wrote:
>
> > Dear all,
> >
> > I'm trying to estimate the parameters of a lognormal distribution fitted
> > from some data.
> >
> > The tricky thing is that my data represent the time at which I recorded
> > certain events. However, in many cases I don't really know when the
> event
> > happened. I' only know the time at which I recorded it as already
> happened.
>
> So this is a rather extreme form of censoring.
>
> > Therefore I want to fit the lognormal from the cumulative distribution
> > function (cdf) rather than from the probability distribution function
> (pdf).
> >
> > My understanding is that methods based on Maximum Likelihood (e.g.
> fitdistr
> > {MASS}) are based on the pdf. Nonlinear least-squares methods seem to be
> > based on the cdf... however I was unable to use nls{stat} for lognormal.
>
> Not so: ML fitting can be done for censored data.  However, I don't think
> you have a valid description here: it seems you never recorded a time at
> which the event had not happened, and the most likely fit is a probability
> mass at zero (since this is a perfect explanation for your data).
>
> To make any progress with censoring, you need to see both positive and
> negative events.  If you told us that none of these events happened before
> t=15, it would be possible to fit the model (although you would need far
> more data to get a good fit).
>
> Generally code to handle censoring is in survival analysis: e.g. survreg()
> in package survival.  In the terminiology of the latter, all your
> observations are left-censored.
>
> > I found a website that explains how to fit univariate distribution
> functions
> > based on cumulative probabilities, including a lognormal example, in
> Matlab:
> >
> http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/cdffitdemo.html
> >
> > and other programs like TableCurve 2D seem to do this too.
>
> Maybe, but that is a different problem.  If you have an ECDF, the jumps
> give you the data so you can just use fitdistr().  (And you will see
> comparing observed and fitted CDFs in MASS, the book.)
>
>
> > There must be a straightforward method in R which I have overlooked. Any
> > suggestion on how can I estimate these parameters in R or helpful
> references
> > are very much appreciated.
> >
> > (not sure if it helps but) here is an example of my type of data:
> >
> > treat.1 <- c(21.67, 21.67, 43.38, 35.50, 32.08, 32.08, 21.67, 21.67,
> 41.33,
> >        41.33, 41.33, 32.08, 21.67, 22.48, 23.25, 30.00, 26.00, 19.37,
> 26.00
> > ,
> >        32.08, 21.67, 26.00, 26.00, 43.38, 26.00, 21.67, 22.48, 35.50,
> 38.30,
> >
> >        32.08)
> >
> > treat.2 <- c(35.92, 12.08, 12.08, 30.00, 33.cy73, 35.92, 12.08, 30.00,
> 56.00,
> >        30.00, 35.92, 33.73, 12.08, 26.00, 54.00, 12.08, 12.08, 35.92,
> 35.92
> > ,
> >        12.08, 33.73, 35.92, 63.20, 30.00, 26.00, 33.73, 23.50, 30.00,
> 35.92
> > ,
> >        30.00)
> >
> > Thank you very much!
> >
> > Ahimsa
> >
> >
> > --
> > ahimsa campos-arceiz
> > www.camposarceiz.com
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Brian D. Ripley,                  [EMAIL PROTECTED]
> Professor of Applied Statistics,  
> http://www.stats.ox.ac.uk/~ripley/<http://www.stats.ox.ac.uk/%7Eripley/>
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



-- 
ahimsa campos-arceiz
www.camposarceiz.com

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fitting a lognormal distribution using cumulative probabilities

Reply via email to