[R] stuck with affy / limma

Maxim Mon, 29 Mar 2010 13:25:18 -0700

Hi,

I have a question concerning the analysis of some affymetrix chips. I
downloaded some of the data from GEO GSE11324 (see below). In doing so I'm
stuck after I identified the probesets with significant changes. I have
problems in assigning probeset specific gene names as well as getting the
genomic coordinates. Furthermore I have no clue how to deal with the fact,
that most genes have different probesets with differential transcriptional
outcomes.



I did this based on the affy and limma manuals like:

targets file:
Name FileName Target
0h1 GSM286031.CEL control
0h2 GSM286032.CEL control
0h3 GSM286033.CEL control
3h1 GSM286034.CEL three
3h2 GSM286035.CEL three
3h3 GSM286036.CEL three
6h1 GSM286037.CEL six
6h2 GSM286038.CEL six
6h3 GSM286039.CEL six


library(affy)
library(limma)
library(vsn)

pd <- read.AnnotatedDataFrame("er_for_affy.txt", header = TRUE, row.names =
2)
pData(pd)
#### load
a <- ReadAffy(filenames = rownames(pData(pd)), phenoData = pd, verbose =
TRUE)
#### normalize
x <- expresso(a, bg.correct = FALSE, normalize.method = "vsn",
normalize.param
= list(subsample = 1000), pmcorrect.method = "pmonly", summary.method =
"medianpolish")
### genes with highest variation
library(hgu133plus2.db)
rsd <- apply(exprs(x), 1, sd)
sel <- order(rsd, decreasing = TRUE)[1:250]


u<-mget(row.names(exprs(x)[sel,]),hgu133plus2SYMBOL)
heatmap(exprs(x)[sel, ], labRow=u)
### in this case it works to extract the gene symbol


### limma contrasts
design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,3)))
colnames(design) <- c("control", "three", "six")
fit <- lmFit(x, design)
meanSdPlot(x)
contrast.matrix <- makeContrasts(three-control, six-control, levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
#### top list
topTable(fit2, coef=1, adjust="BH", number=20, sort.by="M")
library(hgu133plus2.db)
u<-mget(row.names(fit2),hgu133plus2SYMBOL)

How can I produce a topTable result with according gene names, somehow I do
not understand the genelist argument?

Next, I would like to produce a standard clustering of the "fold changes"
observed within (averaged) contrasts 1 (three - control) and 2 (six -
control) and a heatmap presentation of the results. How to extract for
example all fold-changes of those genes with a p-value<0.001 in at least one
of the two contrasts?

The coordinates of the genes I seem to get with
v<-mget(row.names(fit2),hgu133plus2CHRLOC)
v<-mget(row.names(fit2),hgu133plus2CHRLOCEND)
But again I do not know, how to implement it into my fit2 object or topTable
results. Furthermore there are many probesets with multiple mappings, should
these not be excluded from the analysis?

Actually, in the end I'd like to get the corresponding genes' coordinates in
a way, that the maximum region size is given, eg in case of genes with
multiple TSSs.

As mentioned above, I do not know how to deal with the fact, that many genes
are represented with mutliple probesets, often with different fold changes -
is there a general recipe to deal with this question? Furthermore there are
many probesets with multiple mappings, should these not be excluded from the
analysis?

I know it's a lot of questions, so is there a general source of information,
that may help me to overcome the hurdles?

Maxim

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] stuck with affy / limma

Reply via email to