Hi Matt, There are likely more efficient ways still, but this is a big performance boost time-wise for me:
x <- c('18x.6','12x.9','302x.3') gsub("\\.(.+$)", "", x) x <- rep(x, 10^5) > system.time(out1 <- unlist(lapply(strsplit(x,".",fixed=TRUE),function(x) > x[1]))) user system elapsed 2.89 0.03 2.96 > system.time(out2 <- gsub("\\.(.+$)", "", x)) user system elapsed 0.57 0.00 0.59 > all.equal(out1, out2) [1] TRUE Cheers, Josh On Sun, May 29, 2011 at 5:10 PM, Matthew Keller <mckellerc...@gmail.com> wrote: > hi all, > > I'm full of questions today :). Thanks in advance for your help! > > Here's the problem: > x <- c('18x.6','12x.9','302x.3') > > I want to get a vector that is c('18x','12x','302x') > > This is easily done using this code: > > unlist(lapply(strsplit(x,".",fixed=TRUE),function(x) x[1])) > > So far so good. The problem is that x is a vector of length 132e6. > When I run the above code, it runs for > 30 minutes, and it takes > 23 > Gb RAM (no kidding!). > > Does anyone have ideas about how to speed up the code above and (more > importantly) reduce the RAM footprint? I'd prefer not to change the > file on disk using, e.g., awk, but I will do that as a last resort. > > Best > > Matt > > -- > Matthew C Keller > Asst. Professor of Psychology > University of Colorado at Boulder > www.matthewckeller.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.