On Apr 12, 2011, at 08:45 , Achim Zeileis wrote: > On Mon, 11 Apr 2011, ty ty wrote: > >> Hello, dear experts. I don't have much experience in building >> regression models, so sorry if this is too simple and not very >> interesting question. >> Currently I'm working on the model that have to predict proportion of >> the debt returned by the debtor in some period of time. So the >> dependent variable can be any number between 0 and 1 with very high >> probability of 0 (if there are no payment) and if there are some >> payments it can very likely be 1 (all debt paid) although can be any >> number from 0 to 1. >> Not having much knowledge in this area I can't think about any >> appropriate model and wasn't able to find much on the Internet. Can >> anyone give me some ideas about possible models, any information >> on-line and some R functions and packages that can implement it. >> Thank you in advance for any help. > > Beta regression is one possibility to model proportions in the open unit > interval (0, 1). It is available in R in the package "betareg": > > http://CRAN.R-project.org/package=betareg > http://www.jstatsoft.org/v34/i02/ > > If 0 and 1 can occur, some authors have suggested to scale the response so > that 0 and 1 are avoided. See the paper linked above for an example. If, > however, there are many 0s and/or 1s, one might want to take a hurdle or > inflation type approach. One such approach is implemented in the "gamlss" > package: > > http://CRAN.R-project.org/package=gamlss > http://www.jstatsoft.org/v23/i07/ > http://www.gamlss.org/ > > The hurdle approach can be implemented using separate building blocks. > First a binary regression model that captures whether the dependent variable > is greater than 0 (i.e., crosses the hurdle): glm(I(y > 0) ~ ..., > family = binomial). Second a beta regression for only the observations in (0, > 1) that crossed the hurdle: betareg(y ~ ..., subset = y > 0). A recent > technical report introduces such a family of models along with many further > techniques (specialized residuals and regression diagnostics) that are not > yet available in R: > > http://arxiv.org/abs/1103.2372
Hmm, but this is actually 0-_and_-1 inflated, is it not? Various versions of censored regression comes to mind (like a generalized tobit), but I don't know anything that is spot on. Doubly censored regression is not hard to set up using generic likelihood methods, once you decide on the underlying distribution. Obviously, a basic modelling decision is whether the same parameters apply to the censoring process as to the continuous part. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.