Re: [R] ideas about how to reduce RAM & improve speed in trying to use lapply(strsplit())

Peter Ehlers Mon, 30 May 2011 11:03:43 -0700

On 2011-05-29 23:08, Matthew Keller wrote:

God this listserve is awesome. Thanks to everyone for their ideas.
I'll speed&  memory test tomorrow and change the code. Thanks again!


Since you're dealing with a vector of ~ 1e8 elements, you might
find that (at a probably small cost of time) you can reduce the
memory requirements by processing the vector in pieces:

## adjust n to suit trade-off between memory usage and time
n <- 100
k <- length(x) / n
L <- vector("list", n)
for( i in 1:n ) {
  y <- x[seq((i - 1) * k + 1, i * k)]
  L[[i]] <- gsub("^(.*?)\\..*$","\\1",y, perl=TRUE)
}
newx <- unlist(L)


Peter Ehlers


Matt

On Sun, May 29, 2011 at 6:44 PM, Ian Gow<iand...@gmail.com>  wrote:

Not a new approach, but some benchmark data (the perl=TRUE speeds up Jim's
suggestion):

x<- c('18x.6','12x.9','302x.3')
y<- rep(x,100000)
system.time(temp<- unlist(lapply(strsplit(y,".",fixed=TRUE),function(x)
x[1])))

   user  system elapsed
  1.203   0.018   1.222

system.time(temp2<- gsub("^(.*?)\\..*$","\\1",y, perl=TRUE))

   user  system elapsed
  0.176   0.001   0.176

identical(temp2, temp)

[1] TRUE

system.time(temp3<- gsub("^(.*)\\..*", '\\1', y))

   user  system elapsed
  0.292   0.001   0.291

identical(temp3, temp)

[1] TRUE

system.time(temp3<- gsub("^(.*)\\..*", '\\1', y, perl=TRUE))

   user  system elapsed
  0.160   0.001   0.161






On 5/29/11 7:40 PM, "jim holtman"<jholt...@gmail.com>  wrote:

Try this approach:

x<- c('18x.6','12x.9','302x.3')
gsub("^(.*)\\..*", '\\1', x)

[1] "18x"  "12x"  "302x"


On Sun, May 29, 2011 at 8:10 PM, Matthew Keller<mckellerc...@gmail.com>
wrote:

hi all,

I'm full of questions today :). Thanks in advance for your help!

Here's the problem:
x<- c('18x.6','12x.9','302x.3')

I want to get a vector that is c('18x','12x','302x')

This is easily done using this code:

unlist(lapply(strsplit(x,".",fixed=TRUE),function(x) x[1]))

So far so good. The problem is that x is a vector of length 132e6.
When I run the above code, it runs for>  30 minutes, and it takes>  23
Gb RAM (no kidding!).

Does anyone have ideas about how to speed up the code above and (more
importantly) reduce the RAM footprint? I'd prefer not to change the
file on disk using, e.g., awk, but I will do that as a last resort.

Best

Matt

--
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ideas about how to reduce RAM & improve speed in trying to use lapply(strsplit())

Reply via email to