Guys, let me add my 5 coins into your interesting discussion. I have ~10Gb txt file with train data for my model. It has about 150 millions rows for 12 variables. When I load it into memory (just run only one row!):
train<-read.table(file="/training.txt") while loading it takes ~28Gb of RAM (It takes about 2hours to finish), and when data are loaded, rsession takes ~14Gb. I even can't imagine how much it will take when I will run svm train on this data set. Is there any optimization to decrease time required for loading data into memory. I use 32RAM x64 box. Thank you, -Alex ________________________________________ From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of Kurinji Pandiyan [kurinji.pandi...@gmail.com] Sent: 27 March 2012 18:14 To: R. Michael Weylandt Cc: r-help@r-project.org Subject: Re: [R] Memory Utilization on R Thank you for the modified script! I have now tried on different datasets and it works very well and is dramatically faster than my original script! I really appreciate the help. Kurinji On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt < michael.weyla...@gmail.com> wrote: > Taking a look at your script: there are a some potential optimizations > you can do: > > # Fine > poi <- as.character(top.GSM396290) #5000 characters > x.data <- h1[,c(1,7:9)] # 485577 obs of 4 variables > > # Pre-allocate the space > x <- vector("list", 485577) # x <- list() > > # Do the "a" stuff once outside the loop so you aren't doing it 485577 > times > a <- strsplit(as.character(x.data[, "UCSC_REFGENE_NAME"]), ";") > > # Lets use an apply statement instead of a for loop > # vapply is the fastest since we prespecify the return type. > x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ] > > I think this will do what you wanted (and hopefully much faster) > > Note that you could probably tune this further but I think this > strikes a good balance between clarity and performance (for now) > > Hope this helps, > > Michael > > On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan > <kurinji.pandi...@gmail.com> wrote: > > > > Thank you for the input. > > > > As it were, I realized that my script is utilizing a lot more memory than > > I claimed - it was initially using 3 GB but has gone up to 20.24 active > but > > 29.63 assigned to the R session. > > > > The script has run overnight and now I don't think it is active anymore > > since I keep getting the error message that I am out of startup disk > space > > for application memory. > > > > I am attaching screen shots of my RAM usage distribution (given that > there > > is no fluctuation in the usage by the R session I believe it is not > running > > anymore) and of my available HD. > > > > > > > > > > > > Here is my script - > > > > poi <- as.character(top.GSM396290) #5000 characters > > x.data <- h1[,c(1,7:9)] # 485577 obs of 4 variables > > head(x.data) > > > > x <- list() > > > > for(i in 1:485577){ > > a <- as.character(x.data[i, "UCSC_REFGENE_NAME"]) > > a <- unlist(strsplit(a, ";")) > > if(any(poi %in% a) == TRUE) {x[[i]] <- x.data[i,]} > > } > > > > # this step completed in a few hours > > > > x <- do.call(rbind, x) # this step has been running overnight and is > still > > stuck > > > > Thanks, I really appreciate the help. > > Kurinji > > > > On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt > > <michael.weyla...@gmail.com> wrote: > >> > >> Well... what makes you think you are hitting memory constraints then? > >> If you have significantly less than 3GB of data, it shouldn't surprise > >> you if R never needs more than 3GB of memory. > >> > >> You could just be running your scripts inefficiently...it's an extreme > >> example, but all the memory and gigaflopping in the world can't speed > >> this up (by much): > >> > >> for(i in seq_len(1e6)) Sys.sleep(10) > >> > >> Perhaps you should look into profiling tools or parallel > >> computation...if you can post a representative example of your > >> scripts, we might be able to give performance pointers. > >> > >> Michael > >> > >> On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan > >> <kurinji.pandi...@gmail.com> wrote: > >> > Yes, I am. > >> > > >> > Thank you, > >> > Kurinji > >> > > >> > On Mar 22, 2012, at 10:27 PM, "R. Michael Weylandt" > >> > <michael.weyla...@gmail.com> wrote: > >> > > >> >> Use 64bit R? > >> >> > >> >> Michael > >> >> > >> >> On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan > >> >> <kurinji.pandi...@gmail.com> wrote: > >> >>> Hello, > >> >>> > >> >>> I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and > >> >>> 2TB > >> >>> storage. Despite this having so much memory, I am not able to get R > >> >>> to > >> >>> utilize much more than 3 GBs. Some of my scripts take hours to run > >> >>> but I > >> >>> would think they would be much faster if more memory is utilized. > How > >> >>> do I > >> >>> optimize the memory usage on R by my Mac Pro? > >> >>> > >> >>> Thank you! > >> >>> Kurinji > >> >>> > >> >>> [[alternative HTML version deleted]] > >> >>> > >> >>> ______________________________________________ > >> >>> R-help@r-project.org mailing list > >> >>> https://stat.ethz.ch/mailman/listinfo/r-help > >> >>> PLEASE do read the posting guide > >> >>> http://www.R-project.org/posting-guide.html > >> >>> and provide commented, minimal, self-contained, reproducible code. > > > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.