I think it's OK. You can also use the Hmisc package's varclus function. Frank
細田弘吉 wrote: > > Dear Prof. Harrel, > > Thank you very much for your quick advice. > I will try rms package. > > Regarding model reduction, is my model 2 method (clustering and recoding > that are blinded to the outcome) permissible? > > Sincerely, > > -- > KH > > (11/04/20 22:01), Frank Harrell wrote: >> Deleting variables is a bad idea unless you make that a formal part of >> the >> BMA so that the attempt to delete variables is penalized for. Instead of >> BMA I recommend simple penalized maximum likelihood estimation (see the >> lrm >> function in the rms package) or pre-modeling data reduction that is >> blinded >> to the outcome variable. >> Frank >> >> >> 細田弘吉 wrote: >>> >>> Hi everybody, >>> I apologize for long mail in advance. >>> >>> I have data of 104 patients, which consists of 15 explanatory variables >>> and one binary outcome (poor/good). The outcome consists of 25 poor >>> results and 79 good results. I tried to analyze the data with logistic >>> regression. However, the 15 variables and 25 events means events per >>> variable (EPV) is much less than 10 (rule of thumb). Therefore, I used R >>> package, "BMA" to perform logistic regression with BMA to avoid this >>> problem. >>> >>> model 1 (full model): >>> x1, x2, x3, x4 are continuous variables and others are binary data. >>> >>>> x16.bic.glm<- bic.glm(outcome ~ ., data=x16.df, >>> glm.family="binomial", OR20, strict=FALSE) >>>> summary(x16.bic.glm) >>> (The output below has been cut off at the right edge to save space) >>> >>> 62 models were selected >>> Best 5 models (cumulative posterior probability = 0.3606 ): >>> >>> p!=0 EV SD model 1 model2 >>> Intercept 100 -5.1348545 1.652424 -4.4688 -5.15 >>> -5.1536 >>> age 3.3 0.0001634 0.007258 . >>> sex 4.0 >>> .M -0.0243145 0.220314 . >>> side 10.8 >>> .R 0.0811227 0.301233 . >>> procedure 46.9 -0.5356894 0.685148 . -1.163 >>> symptom 3.8 -0.0099438 0.129690 . . >>> stenosis 3.4 -0.0003343 0.005254 . >>> x1 3.7 -0.0061451 0.144084 . >>> x2 100.0 3.1707661 0.892034 3.2221 3.11 >>> x3 51.3 -0.4577885 0.551466 -0.9154 . >>> HT 4.6 >>> .positive 0.0199299 0.161769 . . >>> DM 3.3 >>> .positive -0.0019986 0.105910 . . >>> IHD 3.5 >>> .positive 0.0077626 0.122593 . . >>> smoking 9.1 >>> .positive 0.0611779 0.258402 . . >>> hyperlipidemia 16.0 >>> .positive 0.1784293 0.512058 . . >>> x4 8.2 0.0607398 0.267501 . . >>> >>> >>> nVar 2 2 >>> 1 3 3 >>> BIC -376.9082 >>> -376.5588 -376.3094 -375.8468 -374.5582 >>> post prob 0.104 >>> 0.087 0.077 0.061 0.032 >>> >>> [Question 1] >>> Is it O.K to calculate odds ratio and its 95% confidence interval from >>> "EV" (posterior distribution mean) and“SD”(posterior distribution >>> standard deviation)? >>> For example, 95%CI of EV of x2 can be calculated as; >>>> exp(3.1707661) >>> [1] 23.82573 -----> odds ratio >>>> exp(3.1707661+1.96*0.892034) >>> [1] 136.8866 >>>> exp(3.1707661-1.96*0.892034) >>> [1] 4.146976 >>> ------------------> 95%CI (4.1 to 136.9) >>> Is this O.K.? >>> >>> [Question 2] >>> Is it permissible to delete variables with small value of "p!=0" and >>> "EV", such as age (3.3% and 0.0001634) to reduce the number of >>> explanatory variables and reconstruct new model without those variables >>> for new session of BMA? >>> >>> model 2 (reduced model): >>> I used R package, "pvclust", to reduce the model. The result suggested >>> x1, x2 and x4 belonged to the same cluster, so I picked up only x2. >>> Based on the subject knowledge, I made a simple unweighted sum, by >>> counting the number of clinical features. For 9 features (sex, side, >>> HT2, hyperlipidemia, DM, IHD, smoking, symptom, age), the sum ranges >>> from 0 to 9. This score was defined as ClinicalScore. Consequently, I >>> made up new data set (x6.df), which consists of 5 variables (stenosis, >>> x2, x3, procedure, and ClinicalScore) and one binary outcome >>> (poor/good). Then, for alternative BMA session... >>> >>>> BMAx6.glm<- bic.glm(postopDWI_HI ~ ., data=x6.df, >>> glm.family="binomial", OR=20, strict=FALSE) >>>> summary(BMAx6.glm) >>> (The output below has been cut off at the right edge to save space) >>> Call: >>> bic.glm.formula(f = postopDWI_HI ~ ., data = x6.df, glm.family = >>> "binomial", strict = FALSE, OR = 20) >>> >>> >>> 13 models were selected >>> Best 5 models (cumulative posterior probability = 0.7626 ): >>> >>> p!=0 EV SD model 1 model 2 >>> Intercept 100 -5.6918362 1.81220 -4.4688 -6.3166 >>> stenosis 8.1 -0.0008417 0.00815 . . >>> x2 100.0 3.0606165 0.87765 3.2221 3.1154 >>> x3 46.5 -0.3998864 0.52688 -0.9154 . >>> procedure 49.3 0.5747013 0.70164 . 1.1631 >>> ClinicalScore 27.1 0.0966633 0.19645 . . >>> >>> >>> nVar 2 2 1 >>> 3 3 >>> BIC -376.9082 -376.5588 >>> -376.3094 -375.8468 -375.5025 >>> post prob 0.208 0.175 >>> 0.154 0.122 0.103 >>> >>> [Question 3] >>> Am I doing it correctly or not? >>> I mean this kind of model reduction is permissible for BMA? >>> >>> [Question 4] >>> I still have 5 variables, which violates the rule of thumb, "EPV> 10". >>> Is it permissible to delete "stenosis" variable because of small value >>> of "EV"? Or is it O.K. because this is BMA? >>> >>> Sorry for long post. >>> >>> I appreciate your help very much in advance. >>> >>> -- >>> KH >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> ----- >> Frank Harrell >> Department of Biostatistics, Vanderbilt University >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/BMA-logistic-regression-odds-ratio-model-reduction-etc-tp3462416p3462919.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > -- > ************************************************* > 神戸大学大学院医学研究科 脳神経外科学分野 > 細田 弘吉 > > 〒650-0017 神戸市中央区楠町7丁目5-1 > Phone: 078-382-5966 > Fax : 078-382-5979 > E-mail address > Office: khos...@med.kobe-u.ac.jp > Home : khos...@venus.dti.ne.jp > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/BMA-logistic-regression-odds-ratio-model-reduction-etc-tp3462416p3464392.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.