[R] Example data for analysis of a highly pseudoreplicate mixed-effects experiment

Matthias Gralle Wed, 16 Sep 2009 04:10:00 -0700

Hello everybody,

it may be better to have sample data. I have provided data with lesslevels of "gene" and "day" and only ca. 400 data points per condition.


Sample code:
small=as.data.frame(read.csv("small.csv"))
small$species=factor(small$species)
small$gene=factor(small$gene)
small$day=factor(small$day)
small$repl=factor(small$repl)
library(lme4)
model1=lmer(APC~(FITC+species+gene)^3+(1|day)+(1|repl),REML=F,data=small)
model2=lmer(APC~(FITC+species+gene)^2+(1|day)+(1|repl),REML=F,data=small)
anova(model1,model2)

gives me a significant difference, but in fact there are far too manyresidual df, and this is much worse in the original data. I suppose Icannot trust this difference.


model3=lmer(APC~species*gene+(1|day/repl/FITC),data=small)
Error: length(f1) == length(f2) is not TRUE
In addition: Warning messages:
1: In FITC:(repl:day) :
 numerical expression has 886 elements: only the first used
2: In FITC:(repl:day) :
 numerical expression has 886 elements: only the first used

Can I do nesting without incurring this kind of error ?

And finally

model4=lmer(APC~gene*species+(1|day) + (1|repl) +(1+(gene:species)|FITC),data=small)

model5=lmer(APC~gene+species+(1|day) + (1|repl) + (1|FITC),data=small)
anova(model4,model5)

works with this reduced data set, but fails to converge for theoriginal, much larger data set. I am unsure if this is the right kind ofanalysis for the data and there is only a problem of convergence, or ifit is the wrong formula.

Can anybody give me some advice on this problem ? Or should I just stickto reducing the data from each condition to a single parameter, asexplained in my first email below?


Thank you again!

Matthias Gralle wrote:

Hello everybody,
I have been trying for some weeks to state the correct design of myexperiment as a GLM formula, and have not been able to find somethingappropriate in Pinheiro & Bates, so I am posting it here and hopesomebody can help me.
In each experimental condition, described by
1) gene (10 levels, fixed, because of high interest to me)
2) species (2 levels, fixed, because of high interest)
3) day (2 levels, random)
4) replicate (2 levels per day, random),

I have several thousand data points consisting of two variables:

5) FITC (level of transfection of a cell)
6) APC (antibody binding to the cell)
Because of intrinsic and uncontrollable cell-to-cell variation, FITCvaries quite uniformly over a wide range, and APC correlates ratherwell with FITC. In some cases, I pasted day and replicate together asday_repl.
My question is the following:
Is there any gene (in my set of 10 genes) where the species makes adifference in the relation between FITC and APC ? If yes, in what genedoes species have an effect ? And what is the effect of the speciesdifference ?
My attempts are the following:
1. Fit the data points of each experimental condition to a linearequation APC=Intercept+Slope*FITC and analyse the slopes :
lm(Slope~species*gene*day_repl)
This analysis shows clear differences between the genes, but no effectof species and no interaction gene:species.
The linear fit to the cells is reasonably good, but of course does notrepresent the data set completely, so I wanted to incorporate thecomplete data set.
2a. lmer(APC~FITC*species*gene+(1|day)+(1|repl))
This gives extremely significant values for any interaction andvariable because there are >200 000 df. Of course, it cannot be true,because the cells are not really independent. I have done manyvariations of the above, e.g.
2b. lmer(APC~FITC*species*gene+(1|day)+(1+FITC|day_repl)),
but they all suffer from the excess of df.
3. lmer(APC~species*gene+(1|day/repl/FITC) gives several warningmessages like this one:
In repl:day :
numerical expression has 275591 elements: only the first used
4. lmer(APC~gene*species+(1|day_repl) + (1+gene:species|FITC)) ranseveral days, but failed to converge...
Can somebody give me any hint, or do you think the only possibleanalysis is a simplification as in my model 1 ?
By the way, I am using R version 2.8.0 (2008-10-20) on Ubuntu 8.04 ona linux 2.6.24-24-generic kernel on different Intel systems. I amusing the lme4 that came with R 2.8.0.
Thank you very much for your time!

-- Matthias Gralle, PhD
Dept. Evolutionary Genetics
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
04103 Leipzig, Germany
Tel +49 341 3550 519
Fax +49 341 3550 555

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Matthias Gralle, PhD
Dept. Evolutionary Genetics
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
04103 Leipzig, Germany
Tel +49 341 3550 519
Fax +49 341 3550 555

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Example data for analysis of a highly pseudoreplicate mixed-effects experiment

Reply via email to