Re: [R] Regression model with proportional dependent variable

peter dalgaard Tue, 12 Apr 2011 00:33:33 -0700

On Apr 12, 2011, at 08:45 , Achim Zeileis wrote:

> On Mon, 11 Apr 2011, ty ty wrote:
> 
>> Hello, dear experts. I don't have much experience in building
>> regression models, so sorry if this is too simple and not very
>> interesting question.
>> Currently I'm working on the model that have to predict proportion of
>> the debt returned by the debtor in some period of time. So the
>> dependent variable can be any number between 0 and 1 with very high
>> probability of 0 (if there are no payment) and if there are some
>> payments it can very likely be 1 (all debt paid) although can be any
>> number from 0 to 1.
>> Not having much knowledge in this area I can't think about any
>> appropriate model and wasn't able to find much on the Internet. Can
>> anyone give me some ideas about possible models, any information
>> on-line and some R functions and packages that can implement it.
>> Thank you in advance for any help.
> 
> Beta regression is one possibility to model proportions in the open unit 
> interval (0, 1). It is available in R in the package "betareg":
> 
>  http://CRAN.R-project.org/package=betareg
>  http://www.jstatsoft.org/v34/i02/
> 
> If 0 and 1 can occur, some authors have suggested to scale the response so 
> that 0 and 1 are avoided. See the paper linked above for an example. If, 
> however, there are many 0s and/or 1s, one might want to take a hurdle or 
> inflation type approach. One such approach is implemented in the "gamlss" 
> package:
> 
>  http://CRAN.R-project.org/package=gamlss
>  http://www.jstatsoft.org/v23/i07/
>  http://www.gamlss.org/
> 
> The hurdle approach can be implemented using separate building blocks.
> First a binary regression model that captures whether the dependent variable 
> is greater than 0 (i.e., crosses the hurdle): glm(I(y > 0) ~ ...,
> family = binomial). Second a beta regression for only the observations in (0, 
> 1) that crossed the hurdle: betareg(y ~ ..., subset = y > 0). A recent 
> technical report introduces such a family of models along with many further 
> techniques (specialized residuals and regression diagnostics) that are not 
> yet available in R:
> 
>  http://arxiv.org/abs/1103.2372


Hmm, but this is actually 0-_and_-1 inflated, is it not? Various versions of 
censored regression comes to mind (like a generalized tobit), but I don't know 
anything that is spot on. 

Doubly censored regression is not hard to set up using generic likelihood 
methods, once you decide on the underlying distribution. Obviously, a basic 
modelling decision is whether the same parameters apply to the censoring 
process as to the continuous part. 

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression model with proportional dependent variable

Reply via email to