Dear all, I am trying to estimate a lm model with one continuous dependent variable and 11 independent variables that are all categorical, some of which have many categories (several dozens in some cases).
I am not interested in statistical inference to a larger population. The objective of my model is to find a way to best predict my continuous variable within the sample. When I run the lm model I evidently get many regression coefficients that are not significant. Is there some way to automatically combine levels of a categorical variable together if the regression coefficients for the individual levels are not significant? My idea is to find some form of grouping of the different categories that allows me to work with less levels while keeping or even improving the quality of predictions. Thanks, Michael [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.