Hi there. At first glance it sounded to me as an obvious "no-no" question.
But, for some reason, I ran some trials and results looked pretty
intriguing.

So, I checked 14 genotypes (8 plants from each randomly chosen in the
field) on 4 different dates and measured  them under 2 different
temperatures. As a response, I have 4 different partition of how light is
absorbed in the leaf and they all add up to 1 (part1 + part2 + part3 +
part4 = 1).

So I have a data frame with these colums:

plant |  genotype |  date |  temperature |  part1 |  part2 |  part3 |  part4

So the logic tells me to keep it as simple as this:

*model01*<-
lmer(part1,part2,part3,part4~genotype:date:temperature+(1|plant),data=data,family="binomial")

However, I was just wondering how these partitions correlate. So I did a
test for "Variance inflation factors" on them.

Correlations of the variables

            Part1        part2         part3           part4
part1   1.0000000 -0.1035692 -0.3913199  0.3611188
part2  -0.1035692  1.0000000 -0.7542708  0.1309893
part3  -0.3913199 -0.7542708  1.0000000 -0.6597187
part4   0.3611188  0.1309893 -0.6597187  1.0000000

Variance inflation factors

          GVIF
part1     3.881838
part2   16.648054
part3   29.613167
part4     7.335692

 In general, the response variable is not included in this test. So, let's
pretend I wanna use part2 as my response variable, so I exclude it from the
analyses. I noticed that part2 and part3 have very high correlation
(-0.75). In general, a high correlation between the response and dependent
variable is seen as a good thing, but this is not true if the high
correlation is between two dependent variables. Well, Let me exclude part2
which I am willing to use as a response variable.

Correlations of the variables

            part1        part3          part4
part1   1.0000000 -0.3913199  0.3611188
part3  -0.3913199  1.0000000 -0.6597187
part4   0.3611188  -0.6597187 1.0000000


Variance inflation factors

         GVIF
part1  1.207584
part3  1.859350
part4 1.810761

So, apart that part2 is a variable dependent on part1, part3 and part4, it
look like there's no Collinearity problems in here. So, apart from this,
there is no problem in doing this:

*model02*<-  lmer(part2 ~ genotype : date : temperature : part1 : part3 :
part4 +
(1|plant),data=data,family="binomial")

On *model01* just "temperature" was significant. On the other hand, on*model02
*, just part3 (which is highly correlated with the response variable) was
significant, temperature was not. It appears to me that the high
correlation between part2 and part3 explains the variance on part3 much
better than if any other factor is added.

If I do now a model03 where I do not include part3:

*model03*<-  lmer(part1 ~ genotype : date : temperature : part1 : part4 +
(1|plant),data=data,family="binomial")

I get "temperature" as a significant factor as well as part4 and the
interaction part1*part4. In this analysis "date" is also marginally
significant, and the P values are much better.


So, when we have partitions that adds up to 1, can we use one as response
variable and the others as dependent variables?

-- 
Murilo de Melo Peixoto
PhD candidate- Botany
Department of Ecology and Evolutionary Biology
University of Toronto

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to