Dear R-users, I want to model a proportional variable bounded by [0,1] (the % of land fertilized). A high percentage of the data contains 0s (60%), a smaller percentage contains 1s (10%), and all the rest falls in between.
I want to compare different models with each other to see their performance, however the model I am currently looking at is a zero-one inflated beta model. I am using the R package gamlss for this. However, I am having some troubles with the quite technical documentation of the gamlss package and I don’t seem to find an answer to my questions below: 1) model The model below should model 3 submodels: one part that models the probability of having y=0 versus y>0 (nu.formula), one part that models the probability of having y=1 versus y<1 (tau.formula) and a final part that models all the values in between. gam<-gamlss(proportion~x1+x2,nu.formula=~ x1+x2,tau.formula=~ x1+x2, family= BEINF, data=Alldata) This is okay I think. 2) prediction I would like to know now what is the predicted probability of an observation to have y = 0 or y = 1. I predicted the probability of y = 0 with the code below, however I get values that go far beyond the [0-1] interval. Therefore, they cannot be probabilities since these have to be in the interval [0,1]. Alldata$fit_proportion_0<-predict(gam, what="nu", type='response') summary(Alldata$fit_proportion_0) Could somebody explain me how to obtain the correct probabilities because the code above does not seem to work. I think the answer to my problem can be find on section 10.8.2, page 215 of the following link ( http://www.gamlss.org/wp-content/uploads/2013/01/book-2010-Athens1.pdf). I think it says that the predict function that I use gives another answer, that I have to use in a certain formula to find the real probabilities. But I am not sure how to make this work? 3) interpretation Also, to be sure, I would like to know how to interpret the different coefficients of the three models and how to use the coefficients separately to determine. For the Nu and Tau models these should be interpreted as log-odd ratios, right? And the model in the middle is just a normal log-model, right? 4) validity Finally, I do not find a lot of information on how to correctly test the validity of this model? Do you test that for all three subparts separately? Or is there a test to model the entire model at once? Thank you very much for your help! I am aware of the fact that some of this questions ar very basic. Janka [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.