Dear Jin, Thanks a lot! On 6/16/22, Jin Li <jinl...@gmail.com> wrote: > Hi Hana, > > ROC (or AUC) is misleading and should not be used to assess model > performance. For details, please see the references in "Spatial Predictive > Modelign with R '' that also provides some methods (e.g., gbm, rf, svm and > glmlet) for 1/0 data along with accuracy-based variable selection and > parameter optimisation. > > Hope this helps, > Jin > > On Thu, Jun 16, 2022 at 6:53 AM Hana Tezera <hanatez...@gmail.com> wrote: > >> Dear Tim, Thanks a lot I am looking for different methods for each >> method, I want to select the best predictors and I want to report some >> measures of the accuracy. And I will compare the performance of the >> models, by plotting their ROC curves. >> Best, >> Hana >> >> On 6/15/22, Ebert,Timothy Aaron <teb...@ufl.edu> wrote: >> > The uncorrelated nature of smoking and hypertension is a major medical >> > breakthrough and in contrast to reports like this: >> > https://pubmed.ncbi.nlm.nih.gov/20550499/ and the literature indicates >> the >> > possibility of a relationship between age and hypertension >> > https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4768730/. Depending on the >> > country, there might be a relationship between smoking and age as >> government >> > programs against smoking are developed. >> > >> > Are you looking at different models or different methods. I could have >> > y >> = x >> > + y + z as one model and y=x + z as another model. Alternatively I >> > could >> be >> > comparing ordinary least squares regression versus maximum likelihood >> versus >> > Bayesian linear regression versus nonlinear regression. The former >> > might >> use >> > something like the Akaike information criterion. I am not sure the >> latter is >> > useful (or possible). For example I could approximate an exponential >> > function using a polynomial, but in this context I see no benefit in >> doing >> > so even if I could compare the models. >> > >> > I do not quite understand why this is being done. It feels like fishing >> > statistical methods to get the answer that I know is correct. >> > Generally, >> one >> > should understand the system well enough to select an appropriate model >> > rather than try every possible model in the hope something fits. Of >> course >> > one sometimes collects extra data in the hope that we do not miss an >> > important feature. Then forwards/backwards/stepwise methods are used to >> > identify the "best" model but this is looking at similar models that >> differ >> > only in the list of independent variables. >> > >> > However the problem is solved, I would start by trying to determine if >> any >> > one model was appropriate. Are the model assumptions satisfied? If the >> > answer is no, then try another model until you find one that does >> > satisfy >> > the model assumptions. Alternatively, start with an understanding of >> > the >> > biology and use the best model. Comparing an biologically meaningless >> > statistical model to a biologically meaningful one is an easy choice. >> > >> > Tim >> > >> > -----Original Message----- >> > From: anteneh asmare <hanatez...@gmail.com> >> > Sent: Wednesday, June 15, 2022 1:10 PM >> > To: Ebert,Timothy Aaron <teb...@ufl.edu> >> > Cc: r-help@r-project.org >> > Subject: Re: [R] Model Comparision for case control studies in R >> > >> > [External Email] >> > >> > Dear Tim, Thanks. the first vector >> > y<-c(0,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0,0,1) is the disease status y= >> > (1=Case,0=Control). The covariate age, smoking status and hypertension >> are >> > independent(uncorrelated). The logistic regression (unconditional) will >> > used. But I need to compare other models with logistic regression >> instead of >> > fitting it directly to logistic regression. >> > There is no matching on the data to use conditional logistics >> > regression. >> > Best, >> > Hana >> > On 6/15/22, Ebert,Timothy Aaron <teb...@ufl.edu> wrote: >> >> Disease status is missing from the sample data. >> >> Are age, disease, smoking, and/or hypertension correlated in any way >> >> or are they independent (correlation=0)? >> >> Are the correlations large enough to adversely influence your model? >> >> Tim >> >> >> >> -----Original Message----- >> >> From: R-help <r-help-boun...@r-project.org> On Behalf Of anteneh >> >> asmare >> >> Sent: Wednesday, June 15, 2022 7:29 AM >> >> To: r-help@r-project.org >> >> Subject: [R] Model Comparision for case control studies in R >> >> >> >> [External Email] >> >> >> >> y<-c(0,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0,0,1) >> >> age<-c(45,23,56,67,23,23,28,56,45,47,36,37,33,35,38,39,43,28,39,41) >> >> smoking<-c(0,1,1,1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,0,1) >> >> hypertension<-c(1,1,0,1,0,1,0,1,1,0,1,1,1,1,1,1,0,0,1,0) >> >> data<-data.frame(y,age,smoking,hypertension) >> >> data >> >> model<-glm(y~age+factor(smoking)+factor(hypertension), data, family = >> >> binomial(link = "logit"),na.action = na.omit) >> >> summary(model) >> >> from above sample data I want to study a case-control study on male >> >> individuals with my response variable y, disease status (1=Case, >> >> 0=Control) with covariates age, smoking status(1=Yes, 0=No) and >> >> hypertension, hypertensive (1=Yes, 0=No). I want to fit the model to >> >> predict the disease status using at least two different methods. And >> >> to make model comparisons. I think logistic regression will be the >> >> best fit for this case control study. Do we have other options in >> addition >> >> to logistic regression? >> >> My objective is to fit the model to predict the disease status using >> >> at least two different methods. >> >> Kind regards, >> >> Hana >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail >> >> man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs >> >> Rzsn7AkP-g&m=l7afPQ_gGAoV2EsNoYSYul0qAISEiXLmTmu0IQ03nZO4rcAi9xHZGsWww >> >> ig4oYOB&s=ztyDthknydhlcM49F33Gz6xRl6G7U9s8aIhB1VN-EKY&e= >> >> PLEASE do read the posting guide >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or >> >> g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA >> >> sRzsn7AkP-g&m=l7afPQ_gGAoV2EsNoYSYul0qAISEiXLmTmu0IQ03nZO4rcAi9xHZGsWw >> >> wig4oYOB&s=tcsGkhvtVvoVvb1Ehah-vLRC6an40rJXQXqqfX2f0gI&e= >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Jin > ------------------------------------------ > Jin Li, PhD > Founder, Data2action, Australia > https://www.researchgate.net/profile/Jin_Li32 > https://scholar.google.com/citations?user=Jeot53EAAAAJ&hl=en >
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.