Stepwise regression is a (somewhat controversial) approach that is used more commonly for actual prediction tasks and data mining than for basic biological research.
If your primary aim is to understand how a set of climate variables is associated with organismal shape, then exploratory methods such as partial least squares (PLS) or reduced rank regression (RRR) are preferable. If some of these climate variables do not relate to shape, they will have loadings close to zero. But this is a finding, not an inconvenience. Completely discarding variables should also be based on scientific knowledge, not only on small AIC, R2, or p-values. Only if efficient out-of-sample prediction is important and if the potential predictor variables are numerous and not well understood, can stepwise approaches be helpful to select a small number of predictors. In this pragmatic context, it can prove to be advantageous to discard predictor variables that are only weakly related to the response variable. Multiple regression and reduced rank regression are sensitive to colinearities (i.e., highly correlated predictor variables), but PLS is not. Anyway, I would perform a multivariate analysis (PCA, etc.) of the climate variables first to see how they relate and how redundant they are. You wrote that you want to "determine if there is a reduced model that describes the most amount of shape variation with the least number of climate predictors." Is it really the shape variance that you want to maximize, as in prediction tasks? Or do you want to find the climate variables with the strongest influence (i.e., highest regression coefficients) on shape, as a mode of scientific explanation? It can help to clarify and specify the actual research question in oder to chose the best statistic. Note also that by maximizing variance you tend to find the climate variables that vary most in your sample, even though other variables may have a stronger average influence. For instance, one variable, say maximum temperature in summer, could vary more and explain more shape variance, whereas another variable, say minimum temperature in winter, explains less variance but has a higher slope. In other words, 1°C difference in min. temperature would have a stronger average effect on shape than 1°C difference in max. temperature. Hope this helps! Best, Philipp Mitteroecker On Tuesday, March 23, 2021 at 4:22:26 AM UTC+1 [email protected] wrote: > Hi Everybody, > > Lets say you have a bunch of climate variables (20+) that you want to > test against Procrustes-aligned coordinates. You reduce this down to, say, > six variables using a stepwise procedure to exclude highly correlated > variables using the 'vifstep' function ("usdm" package). > > You would then like to test if these six climate variables can be further > reduced in number via a Stepwise Regression (i.e., Akaike's Information > Criteria, AIC) on a 'procD.lm' full model, in order to determine if there > is a reduced model that describes the most amount of shape variation with > the least number of climate predictors. But the 'step' ("stats") and > 'stepAIC' ("MASS" package) functions don't appear to work on this kind of > data/model, for arrays or matrices. > > How would you go about this? > > I found a very similar question posted on ResearchGate back in 2019, > which, as of now, has zero answers. So I thought I'd try here. Any ideas > are greatly appreciated. > > Best regards, > Rex > > > -- You received this message because you are subscribed to the Google Groups "Morphmet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/morphmet2/897e43b6-d6e6-438a-9b90-32918e5735e7n%40googlegroups.com.
