On Jul 8, 2010, at 2:04 AM, Assa Yeroslaviz wrote:

Hello R users,

I'm trying to extract random samples from a big array I have.

I have a data frame of over 40k lines and would like to produce around 50
random sample of around 200 lines each from this array.

this is the matrix
ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T xxx__8T
yyy_1c yyy_1c _2c
1 A_512  2.150295  2.681759  2.177138  2.142790  2.115344  2.013047
2.115634  2.189372  1.643328  1.563523
2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779
11.650336 11.995504 13.024494 12.776322
3 A_152  2.063276  2.160961  2.067549  2.059732  2.656416  2.075775
2.033982  2.111937  1.606340  1.548940
4 A_163  9.570761 10.448615  9.432859  9.732615 10.354234 10.993279
9.160038  9.104121 10.079177  9.828757
5 A_184  3.574271  4.680859  4.517047  4.047096  3.623668  3.021356
3.559434  3.156093  4.308437  4.045098
6 A_199  7.593952  7.454087  7.513013  7.449552  7.345718  7.367068
7.410085  7.022582  7.668616  7.953706
...

I tried to do it with a for loop:

genelist <- read.delim("/user/R/raw_data.txt")
rownames(genelist) <- genelist[,1]
genes <- rownames(genelist)


One method:

totsize  <- 50 * 200
$ create matrix of indices
smatrix <- matrix(sample( 1:length(genelist$ID), totsize), nrow=200, ncol=50)

# Then any one sample would be:

 genelist[ smatrix[,i], ] for i in 1:50.

You do need to decide whether this approach which creates 50 mutually exclusive samples (if the ID's are unique) is really what you want, since they are not truly independent draws. I think this could be an issue with a ratio of universe:sample ~ 4:1. It's not a bootstrap sample. Could add replace=TRUE in the sample call to fix that.


--
David

x <- 1:40000
set <- matrix(nrow = 50, ncol = 11)

for(i in c(1:50)){
   set[i] <-sample(x,50)
   print(c(i,"->", set), quote = FALSE)
   }

which basically do the trick, but I just can't save the results outside the
loop.
After having the random sets of lines it wasn't a problem to extract the
line from the arrays using subset.

genSet1 <-sample(x,50)
random1 <- genes %in% genSet1
subsetGenelist <- subset(genelist, random1)


is there a different way of creating these random vectors or saving the loop
results outside tjhe loop so I cn work with them?

Thanks a lot

Assa

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to