Haakon,
as replicates imply that they all have the same data type, you can put
them into a matrix which is often faster and needs less memory (though
whether that can really matter depends of the number of replicates you
have: for small no of replicates you won't have much effect anyways).
But I find it handy to have the matrix of replicates with data$rep.
data <- data.frame (plateNo = a, Well = b, rep = I (cbind (c, d, e)))
> data
plateNo Well rep.c rep.d rep.e
1 1 A01 1312 963 1172
2 1 A02 10464 6715 5628
3 1 A03 3301 3257 3281
4 1 A04 3895 3350 3496
5 1 A05 8731 7389 5701
6 2 A01 7893 6748 5920
7 2 A02 2912 2385 2586
8 2 A03 985 785 809
9 2 A04 1346 1018 1001
10 2 A05 794 314 486
> dim (data)
[1] 10 3
Then:
data$norm <- data$rep / apply (data$rep, 2, ave, plateNo = data$plateNo)
you can also do the division into the apply:
data$norm <- apply (data$rep, 2, function (x) x / ave(x, plateNo =
data$plateNo))
If you always have the sampe number of wells per plate, you could also
"fold" the data$rep matrix into an array:
arep <- array (data$rep, dim = c (2, 5, 3))
anorm <- arep / rep (colMeans (arep), each = 2)
dim (anorm) <- dim (data$rep)
data$norm <- anorm
Here are some microbenchmark results:
Unit: nanoeconds
min lq median uq max
[1,] 1525160 1561280 1627620 1685020 3575719
[2,] 1505641 1539500 1560301 1649081 3538001
[3,] 113321 115041 115821 116881 155681
[4,] 2589800 2627280 2662540 2794920 4646399
1 and 2 are the two apply versions above.
3 is the array
4 are your loops
HTH
Claudia
Am 11.03.2011 18:38, schrieb hi Berven:
Hello all,
I'm new to R and trying to figure out how to perform calculations on a large dataset (300 000
datapoints). I have already made some code to do this but it is awfully slow. What I want to do is
add a new column for each "rep_ " column where I have taken each value and divide it by
the mean of all values where "PlateNo" is the same. My data is in the following format:
data
PlateNo
Well
rep_1
rep_2
rep_3
1
A01
1312
963
1172
1
A02
10464
6715
5628
1
A03
3301
3257
3281
1
A04
3895
3350
3496
1
A05
8731
7389
5701
2
A01
7893
6748
5920
2
A02
2912
2385
2586
2
A03
985
785
809
2
A04
13462
1018
1001
2
A05
794
314
486
To generate it copy:
a<- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
b<- c("A01", "A02", "A03", "A04", "A05", "A01", "A02", "A03", "A04", "A05")
c<- c(1312, 10464, 3301, 3895, 8731, 7893, 2912, 985, 1346, 794)
d<- c(963, 6715, 3257, 3350, 7389, 6748, 2385, 785, 1018, 314)
e<- c(1172, 5628, 3281, 3496, 5701, 5920, 2586, 809, 1001, 486)
data<- data.frame(plateNo = a, Well = b, rep_1 = c, rep_2 = d, rep_3 = e)
Here is the code I have come up with:
rows<- length(data$plateNo)
reps<- 3
norm<- list()
for (rep in 1:reps) {
x<- paste("rep_",rep,sep="")
normx<- paste("normalised_",rep,sep="")
for (row in 1:rows) {
plateMean<-
mean(data[[x]][data$plateNo == data$plateNo[row]])
wellData<- data[[x]][row]
norm[[normx]][row]<- wellData
/ plateMean
}
}
Any help or tips would be greatly appreciated!
Thanks,
Haakon
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.