I use plyr and am learning dplyr and magrittr, but those are just syntactic sugar. What I have been having difficulty with in this thread is the idea that it somehow makes sense to pad vectors with NA values... because I really don't think it does. It seems more like a hammer looking for a nail because that is what it knows how to deal with.
You have a list of matrices with data in them, and switching from for loops to lapply is not in itself going to fix a memory or speed problem... normally the big improvements are in the way you allocate and use your data. Burns talks about pre-allocating the result to speed things up, but I don't understand the problem well enough to suggest an efficient data structure to pre-allocate. I suggest that Karim read and adhere to the Posting Guide (particularly the bits about giving a reproducible example and posting in plain text so it doesn't get scrambled) if help with optimizing is desired. The discussion at [1] might clarify what "reproducible" means. I will also mention that efficient algorithms for this subject area are frequently available in the Bioconductor project, so I hope you are not re-inventing the wheel and have already reviewed their tools. [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On January 19, 2015 6:11:38 PM PST, Ben Tupper <btup...@bigelow.org> wrote: >Hi, > >On Jan 19, 2015, at 5:17 PM, Karim Mezhoud <kmezh...@gmail.com> wrote: > >> Thanks Ben. >> I need to learn more about apply. Have you a link or tutorial about >apply. R documentation is very short. >> >> How can obtain: >> z <- list (Col1, Col2, Col3, Col4......)? >> > >This may not be the most efficient way and there certainly is no error >checking, but you can wrap one lapply within another as shown below. >The innermost iterates over your list of input matrices, extracting one >column specified per list element. The outer lapply iterates over the >various column numbers you want to extract. > > >getMatrices <- function(colNums, dataList = x){ > # the number of rows required > n <- max(sapply(dataList, nrow)) >lapply(colNums, function(x, dat, n) { # iterate along requested columns >do.call(cbind, lapply(dat, getColumn,x, len=n)) # iterate along input >data list > }, dataList, n) >} > >getMatrices(c(1,3), dataList = x) > >If we are lucky, one of the plyr package users might show us how to do >the same with a one-liner. > > >There are endless resources online, here are some gems. > >http://www.r-project.org/doc/bib/R-books.html >http://www.rseek.org/ >http://www.burns-stat.com/documents/ >http://www.r-bloggers.com/ > >Also, I found "Data Manipulation with R" ( >http://www.r-project.org/doc/bib/R-books_bib.html#R:Spector:2008 ) >helpful. > >Ben > >> Thanks >> >> Ô__ >> c/ /'_;~~~~kmezhoud >> (*) \(*) ⴽⴰⵔⵉⵎ ⵎⴻⵣⵀⵓⴷ >> http://bioinformatics.tn/ >> >> >> >> On Mon, Jan 19, 2015 at 8:22 PM, Ben Tupper <btup...@bigelow.org> >wrote: >> Hi again, >> >> On Jan 19, 2015, at 1:53 PM, Karim Mezhoud <kmezh...@gmail.com> >wrote: >> >>> Yes Many thanks. >>> That is my request using lapply. >>> >>> do.call(cbind,col1) >>> >>> converts col1 to matrix but does not fill empty value with NA. >>> >>> Even for >>> >>> matrix(unlist(col1), ncol=5,byrow = FALSE) >>> >>> >>> How can get Matrix class of col1? And fill empty values with NA? >>> >> >> Perhaps best is to determine the maximum number of rows required >first, then force each subset to have that length. >> >> # make a list of matrices, each with nCol columns and differing >> # number of rows >> nCol <- 3 >> nRow <- sample(3:10, 5) >> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol = >nc, nrow = x)}, nCol) >> x >> >> # make a simple function to get a single column from a matrix >> getColumn <- function(x, colNum, len = nrow(x)) { >> y <- x[,colNum] >> length(y) <- len >> y >> } >> >> # what is the maximum number of rows >> n <- max(sapply(x, nrow)) >> >> # use the function to get the column from each matrix >> col1 <- lapply(x, getColumn, 1, len = n) >> col1 >> >> do.call(cbind, col1) >> [,1] [,2] [,3] [,4] [,5] >> [1,] 3 8 5 7 9 >> [2,] 4 9 6 8 10 >> [3,] 5 10 7 9 11 >> [4,] NA 11 8 10 12 >> [5,] NA 12 9 11 13 >> [6,] NA 13 NA 12 14 >> [7,] NA 14 NA 13 15 >> [8,] NA 15 NA NA 16 >> [9,] NA NA NA NA 17 >> >> Ben >> >>> Thanks >>> Karim >>> >>> >>> Ô__ >>> c/ /'_;~~~~kmezhoud >>> (*) \(*) ⴽⴰⵔⵉⵎ ⵎⴻⵣⵀⵓⴷ >>> http://bioinformatics.tn/ >>> >>> >>> >>> On Mon, Jan 19, 2015 at 4:36 PM, Ben Tupper <ben.bigh...@gmail.com> >wrote: >>> Hi, >>> >>> On Jan 18, 2015, at 4:36 PM, Karim Mezhoud <kmezh...@gmail.com> >wrote: >>> >>> > Dear All, >>> > I am trying to get correlation between Diseases (80) in columns >and >>> > samples in rows (UNEQUAL) using gene expression (at less >1000,numeric). For >>> > this I can use CORREP package with cor.unbalanced function. >>> > >>> > But before to get this final matrix I need to load and to store >the >>> > expression of 1000 genes for every Disease (80). Every disease has >>> > different number of samples (between 50 - 500). >>> > >>> > It is possible to get a cube of matrices with equal columns but >unequal >>> > rows? I think NO and I can't use array function. >>> > >>> > I am trying to get à list of matrices having the same number of >columns but >>> > different number of rows. as >>> > >>> > Cubist <- vector("list", 1) >>> > Cubist$Expression <- vector("list", 1) >>> > >>> > >>> > for (i in 1:80){ >>> > >>> > matrix <- function(getGeneExpression[i]) >>> > Cubist$Expression[[Disease[i]]] <- matrix >>> > >>> > } >>> > >>> > At this step I have: >>> > length(Cubist$Expression) >>> > #80 >>> > dim(Cubist$Expression$Disease1) >>> > #526 1000 >>> > dim(Cubist$Expression$Disease2) >>> > #106 1000 >>> > >>> > names(Cubist$Expression$Disease1[4]) >>> > #ABD >>> > >>> > names(Cubist$Expression$Disease2[4]) >>> > #ABD >>> > >>> > Now I need to built the final matrices for every genes (1000) that >I will >>> > use for CORREP function. >>> > >>> > Is there a way to extract directly the first column (first gene) >for all >>> > Diseases (80) from Cubist$Expression? or >>> > >>> >>> I don't understand most your question, but the above seems to be >straight forward. Here's a toy example: >>> >>> # make a list of matrices, each with nCol columns and differing >>> # number of rows, nRow >>> nCol <- 3 >>> nRow <- sample(3:10, 5) >>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol = >nc, nrow = x)}, nCol) >>> x >>> >>> # make a simple function to get a single column from a matrix >>> getColumn <- function(x, colNum) { >>> return(x[,colNum]) >>> } >>> >>> # use the function to get the column from each matrix >>> col1 <- lapply(x, getColumn, 1) >>> col1 >>> >>> Does that help answer this part of your question? If not, you may >need to create a very small example of your data and post it here using >the head() and dput() functions. >>> >>> Ben >>> >>> >>> >>> > I need to built 1000 matrices with 80 columns and unequal rows? >>> > >>> > Cublist$Diseases <- vector("list", 1) >>> > >>> > for (k in 1:1000){ >>> > for (i in 1:80){ >>> > >>> > Cublist$Diseases[[gene[k] ]] <- Cubist$Expression[[Diseases[i] >]][k] >>> > } >>> > >>> > } >>> > >>> > This double loops is time consuming...Is there a way to do this >faster? >>> > >>> > Thanks, >>> > karim >>> > Ô__ >>> > c/ /'_;~~~~kmezhoud >>> > (*) \(*) ⴽⴰⵔⵉⵎ ⵎⴻⵣⵀⵓⴷ >>> > http://bioinformatics.tn/ >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> Ben Tupper >> Bigelow Laboratory for Ocean Sciences >> 60 Bigelow Drive, P.O. Box 380 >> East Boothbay, Maine 04544 >> http://www.bigelow.org >> >> >> >> >> >> >> >> >> > >Ben Tupper >Bigelow Laboratory for Ocean Sciences >60 Bigelow Drive, P.O. Box 380 >East Boothbay, Maine 04544 >http://www.bigelow.org > > > > > > > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.