On Jul 8, 2010, at 2:04 AM, Assa Yeroslaviz wrote:
Hello R users,
I'm trying to extract random samples from a big array I have.
I have a data frame of over 40k lines and would like to produce
around 50
random sample of around 200 lines each from this array.
this is the matrix
ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T
xxx__8T
yyy_1c yyy_1c _2c
1 A_512 2.150295 2.681759 2.177138 2.142790 2.115344 2.013047
2.115634 2.189372 1.643328 1.563523
2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779
11.650336 11.995504 13.024494 12.776322
3 A_152 2.063276 2.160961 2.067549 2.059732 2.656416 2.075775
2.033982 2.111937 1.606340 1.548940
4 A_163 9.570761 10.448615 9.432859 9.732615 10.354234 10.993279
9.160038 9.104121 10.079177 9.828757
5 A_184 3.574271 4.680859 4.517047 4.047096 3.623668 3.021356
3.559434 3.156093 4.308437 4.045098
6 A_199 7.593952 7.454087 7.513013 7.449552 7.345718 7.367068
7.410085 7.022582 7.668616 7.953706
...
I tried to do it with a for loop:
genelist <- read.delim("/user/R/raw_data.txt")
rownames(genelist) <- genelist[,1]
genes <- rownames(genelist)
One method:
totsize <- 50 * 200
$ create matrix of indices
smatrix <- matrix(sample( 1:length(genelist$ID), totsize), nrow=200,
ncol=50)
# Then any one sample would be:
genelist[ smatrix[,i], ] for i in 1:50.
You do need to decide whether this approach which creates 50 mutually
exclusive samples (if the ID's are unique) is really what you want,
since they are not truly independent draws. I think this could be an
issue with a ratio of universe:sample ~ 4:1. It's not a bootstrap
sample. Could add replace=TRUE in the sample call to fix that.
--
David
x <- 1:40000
set <- matrix(nrow = 50, ncol = 11)
for(i in c(1:50)){
set[i] <-sample(x,50)
print(c(i,"->", set), quote = FALSE)
}
which basically do the trick, but I just can't save the results
outside the
loop.
After having the random sets of lines it wasn't a problem to extract
the
line from the arrays using subset.
genSet1 <-sample(x,50)
random1 <- genes %in% genSet1
subsetGenelist <- subset(genelist, random1)
is there a different way of creating these random vectors or saving
the loop
results outside tjhe loop so I cn work with them?
Thanks a lot
Assa
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.