Agreed on the ranking of (1) vs (2)
On Sun, Nov 20, 2022 at 1:30 PM Ebert,Timothy Aaron <teb...@ufl.edu> wrote: > I like option 1. Option 2 may cause problems if you are pooling groups > that do not go together. This is especially a problem if you know that the > data is missing some groups. I would consider dropping rare groups - or > compare results between pooling and dropping options. If the answer is the > same in both cases then use the approach that makes your life easier with > reviewers/clients. If the answer is different then I would go with dropping > rare categories, or present both and highlight the difference in outcome. A > third option is to gather more data. > > Tim > > -----Original Message----- > From: R-help <r-help-boun...@r-project.org> On Behalf Of Bert Gunter > Sent: Sunday, November 20, 2022 1:06 PM > To: Mitchell Maltenfort <mmal...@gmail.com> > Cc: R-help <R-help@r-project.org> > Subject: Re: [R] test logistic regression model > > [External Email] > > I think (2) might be a bad idea if one of the "sparse"categories has high > predictive power. You'll lose it when you pool, will you not? > Also, there is the problem of subjectively defining "sparse." > > However, 1) seems quite sensible to me. But IANAE. > > -- Bert > > On Sun, Nov 20, 2022 at 9:49 AM Mitchell Maltenfort <mmal...@gmail.com> > wrote: > > > > Two possible fixes occur to me > > > > 1) Redo the test/training split but within levels of factor - so you > > have the same split within each level and each level accounted for in > > training and testing > > > > 2) if you have a lot of levels, and perhaps sparse representation in a > > few, consider recoding levels to pool the rare ones into an "other" > > category > > > > On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter <bgunter.4...@gmail.com> > wrote: > >> > >> small reprex: > >> > >> set.seed(5) > >> dat <- data.frame(f = rep(c('r','g'),4), y = runif(8)) newdat <- > >> data.frame(f =rep(c('r','g','b'),2)) ## convert values in newdat not > >> seen in dat to NA > >> is.na(newdat$f) <-!( newdat$f %in% dat$f) lmfit <- lm(y~f, data = > >> dat) > >> > >> ##Result: > >> > predict(lmfit,newdat) > >> 1 2 3 4 5 6 > >> 0.4374251 0.6196527 NA 0.4374251 0.6196527 NA > >> > >> If this does not suffice, as Rui said, we need details of what you did. > >> (predict.glm works like predict.lm) > >> > >> > >> -- Bert > >> > >> > >> On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas <ruipbarra...@sapo.pt> > wrote: > >> > > >> > Às 15:29 de 20/11/2022, Gábor Malomsoki escreveu: > >> > > Dear Bert, > >> > > > >> > > Yes, was trying to fill the not existing categories with NAs, but > >> > > the suggested solutions in stackoverflow.com unfortunately did not > work. > >> > > > >> > > Best regards > >> > > Gabor > >> > > > >> > > > >> > > Bert Gunter <bgunter.4...@gmail.com> schrieb am So., 20. Nov. > 2022, 16:20: > >> > > > >> > >> You can't predict results for categories that you've not seen > >> > >> before (think about it). You will need to remove those cases > >> > >> from your test set (or convert them to NA and predict them as NA). > >> > >> > >> > >> -- Bert > >> > >> > >> > >> On Sun, Nov 20, 2022 at 7:02 AM Gábor Malomsoki > >> > >> <gmalomsoki1...@gmail.com> > >> > >> wrote: > >> > >> > >> > >>> Dear all, > >> > >>> > >> > >>> i have created a logistic regression model, > >> > >>> on the train df: > >> > >>> mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = > >> > >>> "binomial") > >> > >>> > >> > >>> then i try to predict with the test df > >> > >>> Predict<- predict(mymodel1, newdata = test, type = "response") > >> > >>> then iget this error message: > >> > >>> Error in model.frame.default(Terms, newdata, na.action = > >> > >>> na.action, xlev = > >> > >>> object$xlevels) > >> > >>> Factor "TG_KraftF5" has new levels > >> > >>> > >> > >>> i have tried different proposals from stackoverflow, but > >> > >>> unfortunately they did not solved the problem. > >> > >>> Do you have any idea how to test a logistic regression model > >> > >>> when you have different levels in train and in test df? > >> > >>> > >> > >>> thank you in advance > >> > >>> Regards, > >> > >>> Gabor > >> > >>> > >> > >>> [[alternative HTML version deleted]] > >> > >>> > >> > >>> ______________________________________________ > >> > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > >> > >>> see > >> > >>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F > >> > >>> %2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01% > >> > >>> 7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f > >> > >>> 84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951801851%7CUnknow > >> > >>> n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1 > >> > >>> haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ceyiq3LmFfHRlfnrw > >> > >>> 87wzELUGTHLSv7qvuv1tyqGruU%3D&reserved=0 > >> > >>> PLEASE do read the posting guide > >> > >>> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F% > >> > >>> 2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Cteb > >> > >>> ert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a31 > >> > >>> 4d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CT > >> > >>> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi > >> > >>> LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FE > >> > >>> s7PbWm5EQvFHWNga2JwHWeY%3D&reserved=0 > >> > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >>> > >> > >> > >> > > > >> > > [[alternative HTML version deleted]] > >> > > > >> > > ______________________________________________ > >> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2 > >> > > Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Cte > >> > > bert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314 > >> > > d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFp > >> > > bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV > >> > > CI6Mn0%3D%7C3000%7C%7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0 > >> > > FUAQbciQXRPOe7KM%3D&reserved=0 > >> > > PLEASE do read the posting guide > >> > > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2F > >> > > www.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert% > >> > > 40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ac > >> > > e60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb > >> > > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn > >> > > 0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFH > >> > > WNga2JwHWeY%3D&reserved=0 and provide commented, minimal, > >> > > self-contained, reproducible code. > >> > > >> > hello, > >> > > >> > What exactly didn't work? You say you have tried the solutions > >> > found in stackoverflow but without a link, we don't know which > >> > answers to which questions you are talking about. > >> > Like Bert said, if you assign NA to the new levels, present only in > >> > test, it should work. > >> > > >> > Can you post links to what you have tried? > >> > > >> > Hope this helps, > >> > > >> > Rui Barradas > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsta > >> t.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40u > >> fl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a623 > >> 31e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi > >> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C% > >> 7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0FUAQbciQXRPOe7KM%3D& > >> reserved=0 PLEASE do read the posting guide > >> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww. > >> r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.ed > >> u%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b > >> 84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL > >> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C > >> &sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFHWNga2JwHWeY%3D&reser > >> ved=0 and provide commented, minimal, self-contained, reproducible > >> code. > > > > -- > > Sent from Gmail Mobile > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0FUAQbciQXRPOe7KM%3D&reserved=0 > PLEASE do read the posting guide > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFHWNga2JwHWeY%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code. > -- Sent from Gmail Mobile [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.