Note that you can actually drop the line defining the big list "x". I thought it would be needed, but it turns out to be unnecessary after cleaning up the second half: cutting off that allocation might save you even more time.
Best, Michael On Tue, Mar 27, 2012 at 11:14 AM, Kurinji Pandiyan <kurinji.pandi...@gmail.com> wrote: > Thank you for the modified script! I have now tried on different datasets > and it works very well and is dramatically faster than my original script! > > I really appreciate the help. > Kurinji > > On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt > <michael.weyla...@gmail.com> wrote: >> >> Taking a look at your script: there are a some potential optimizations >> you can do: >> >> # Fine >> poi <- as.character(top.GSM396290) #5000 characters >> x.data <- h1[,c(1,7:9)] # 485577 obs of 4 variables >> >> # Pre-allocate the space >> x <- vector("list", 485577) # x <- list() >> >> # Do the "a" stuff once outside the loop so you aren't doing it 485577 >> times >> a <- strsplit(as.character(x.data[, "UCSC_REFGENE_NAME"]), ";") >> >> # Lets use an apply statement instead of a for loop >> # vapply is the fastest since we prespecify the return type. >> x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ] >> >> I think this will do what you wanted (and hopefully much faster) >> >> Note that you could probably tune this further but I think this >> strikes a good balance between clarity and performance (for now) >> >> Hope this helps, >> >> Michael >> >> On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan >> <kurinji.pandi...@gmail.com> wrote: >> > >> > Thank you for the input. >> > >> > As it were, I realized that my script is utilizing a lot more memory >> > than >> > I claimed - it was initially using 3 GB but has gone up to 20.24 active >> > but >> > 29.63 assigned to the R session. >> > >> > The script has run overnight and now I don't think it is active anymore >> > since I keep getting the error message that I am out of startup disk >> > space >> > for application memory. >> > >> > I am attaching screen shots of my RAM usage distribution (given that >> > there >> > is no fluctuation in the usage by the R session I believe it is not >> > running >> > anymore) and of my available HD. >> > >> > >> > >> > >> > >> > Here is my script - >> > >> > poi <- as.character(top.GSM396290) #5000 characters >> > x.data <- h1[,c(1,7:9)] # 485577 obs of 4 variables >> > head(x.data) >> > >> > x <- list() >> > >> > for(i in 1:485577){ >> > a <- as.character(x.data[i, "UCSC_REFGENE_NAME"]) >> > a <- unlist(strsplit(a, ";")) >> > if(any(poi %in% a) == TRUE) {x[[i]] <- x.data[i,]} >> > } >> > >> > # this step completed in a few hours >> > >> > x <- do.call(rbind, x) # this step has been running overnight and is >> > still >> > stuck >> > >> > Thanks, I really appreciate the help. >> > Kurinji >> > >> > On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt >> > <michael.weyla...@gmail.com> wrote: >> >> >> >> Well... what makes you think you are hitting memory constraints then? >> >> If you have significantly less than 3GB of data, it shouldn't surprise >> >> you if R never needs more than 3GB of memory. >> >> >> >> You could just be running your scripts inefficiently...it's an extreme >> >> example, but all the memory and gigaflopping in the world can't speed >> >> this up (by much): >> >> >> >> for(i in seq_len(1e6)) Sys.sleep(10) >> >> >> >> Perhaps you should look into profiling tools or parallel >> >> computation...if you can post a representative example of your >> >> scripts, we might be able to give performance pointers. >> >> >> >> Michael >> >> >> >> On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan >> >> <kurinji.pandi...@gmail.com> wrote: >> >> > Yes, I am. >> >> > >> >> > Thank you, >> >> > Kurinji >> >> > >> >> > On Mar 22, 2012, at 10:27 PM, "R. Michael Weylandt" >> >> > <michael.weyla...@gmail.com> wrote: >> >> > >> >> >> Use 64bit R? >> >> >> >> >> >> Michael >> >> >> >> >> >> On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan >> >> >> <kurinji.pandi...@gmail.com> wrote: >> >> >>> Hello, >> >> >>> >> >> >>> I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and >> >> >>> 2TB >> >> >>> storage. Despite this having so much memory, I am not able to get R >> >> >>> to >> >> >>> utilize much more than 3 GBs. Some of my scripts take hours to run >> >> >>> but I >> >> >>> would think they would be much faster if more memory is utilized. >> >> >>> How >> >> >>> do I >> >> >>> optimize the memory usage on R by my Mac Pro? >> >> >>> >> >> >>> Thank you! >> >> >>> Kurinji >> >> >>> >> >> >>> [[alternative HTML version deleted]] >> >> >>> >> >> >>> ______________________________________________ >> >> >>> R-help@r-project.org mailing list >> >> >>> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >>> PLEASE do read the posting guide >> >> >>> http://www.R-project.org/posting-guide.html >> >> >>> and provide commented, minimal, self-contained, reproducible code. >> > >> > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.