[R] question about reproducibility/consistency of principal component and lda directions in R

David Romano Fri, 08 Feb 2013 11:15:58 -0800

Hi everyone,

I'm not exactly sure how to ask this question most clearly, but I hope that
giving the context in which it occurs for me will help:   I'm trying to
compare the brain images of two patient populations; each image is composed
of voxels (the 3D analogue of pixels), and I have two images per patient,
one reflecting grey matter concentration at each voxel, and the other
reflecting white matter concentration at each voxel.


I determined the groups by means of an analysis that involved information
from both types of images, and what I set out to do was to get a rough idea
of where in the brain the two groups showed the most striking differences.

My first attempt was to replace -- on a voxel by voxel basis -- the
bivariate grey/white data by a combined univariate measure, namely the
first principal component score.   From these principal component scores I
calculated Cohen's d to obtain a rough estimate of the effect size at each
voxel, and the resulting brain images show very nice separation into
meaningful brain regions, some corresponding to negative effect sizes and
some to positive ones.

What puzzles me about how nice the separation into brain regions is, is
that the meaning of positive and negative is determined by the choice of
the first principal component direction at each voxel, but this choice is
-- in principle (no pun intended -- sorry!) -- arbitrary.  (Meaning whether
an eigenvector or its negative is chosen as the direction is in principle
arbitrary.)

So here are my questions:   Does the algorithm used in R produce the same
principal component directions if applied to the same data repeatedly?
And if so, should the directions chosen by the algorithm change
continuously with the data?  For example, if one data set were obtained by
applying a small amount of noise to another, should the resulting
directions be close to each other (as opposed to close negative of each
other)?  (Assuming the data is far from being "singular" in some vague
sense I'm not sure how to make precise.)

My second attempt was to do the same, but with the first lda scores, so I
have the same questions about lda directions, too.

Any light you could shed on these questions would be very welcome!

Thanks in advance,
David Romano

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] question about reproducibility/consistency of principal component and lda directions in R

Reply via email to