[MORPHMET2] Re: Stepwise regression on ProcD.lm models

[email protected] Wed, 24 Mar 2021 03:29:02 -0700


Stepwise regression is a (somewhat controversial) approach that is used 
more commonly for actual prediction tasks and data mining than for basic 
biological research.

If your primary aim is to understand how a set of climate variables is 
associated with organismal shape, then exploratory methods such as partial 
least squares (PLS) or reduced rank regression (RRR) are preferable. If 
some of these climate variables do not relate to shape, they will have 
loadings close to zero. But this is a finding, not an inconvenience. 
Completely discarding variables should also be based on scientific 
knowledge, not only on small AIC, R2, or p-values.

Only if efficient out-of-sample prediction is important and if the 
potential predictor variables are numerous and not well understood, can 
stepwise approaches be helpful to select a small number of predictors. In 
this pragmatic context, it can prove to be advantageous to discard 
predictor variables that are only weakly related to the response variable. 

Multiple regression and reduced rank regression are sensitive to 
colinearities (i.e., highly correlated predictor variables), but PLS is 
not. Anyway, I would perform a multivariate analysis (PCA, etc.) of the 
climate variables first to see how they relate and how redundant they are. 

You wrote that you want to "determine if there is a reduced model that 
describes the most amount of shape variation with the least number of 
climate predictors." Is it really the shape variance that you want to 
maximize, as in prediction tasks? Or do you want to find the climate 
variables with the strongest influence (i.e., highest regression 
coefficients) on shape, as a mode of scientific explanation? It can help to 
clarify and specify the actual research question in oder to chose the best 
statistic. 

Note also that by maximizing variance you tend to find the climate 
variables that vary most in your sample, even though other variables may 
have a stronger average influence. For instance, one variable, say maximum 
temperature in summer, could vary more and explain more shape variance, 
whereas another variable, say minimum temperature in winter, explains less 
variance but has a higher slope. In other words, 1°C difference in min. 
temperature would have a stronger average effect on shape than 1°C 
difference in max. temperature.

Hope this helps!

Best,

Philipp Mitteroecker

On Tuesday, March 23, 2021 at 4:22:26 AM UTC+1 [email protected] wrote:

> Hi Everybody,
>
> Lets say you have a bunch of climate variables  (20+) that you want to 
> test against Procrustes-aligned coordinates. You reduce this down to, say, 
> six variables using a stepwise procedure to exclude highly correlated 
> variables using the 'vifstep' function ("usdm" package). 
>
> You would then like to test if these six climate variables can be further 
> reduced in number via a Stepwise Regression (i.e., Akaike's Information 
> Criteria, AIC) on a 'procD.lm' full model, in order to determine if there 
> is a reduced model that describes the most amount of shape variation with 
> the least number of climate predictors. But the 'step' ("stats") and 
> 'stepAIC' ("MASS" package) functions don't appear to work on this kind of 
> data/model, for arrays or matrices.
>
> How would you go about this?
>
> I found a very similar question posted on ResearchGate back in 2019, 
> which, as of now, has zero answers. So I thought I'd try here. Any ideas 
> are greatly appreciated.
>
> Best regards,
> Rex
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/morphmet2/897e43b6-d6e6-438a-9b90-32918e5735e7n%40googlegroups.com.

[MORPHMET2] Re: Stepwise regression on ProcD.lm models

Reply via email to