Look at sqldf or data.table packages. Lists are slow for lookup and not 
particularly efficient with memory. numeric indexing into matrices or data 
frames is more typical in R, and the above mentioned packages support indexing 
to speed up lookups. Also, carefully consider whether you can program your 
processing in bulk... vector or relational processing can be critical for 
performance.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On October 30, 2014 8:17:59 AM PDT, Thomas Nyberg <tomnyb...@gmail.com> wrote:
>Hello,
>
>I want to do the following: Given a set of (number, value) pairs, I
>want 
>to create a list l so that l[[toString(number)]] returns the vector of 
>values associated to that number. It is hundreds of times slower than 
>the equivalent that I would write in python. I'm pretty new to R so I 
>bet I'm using its data structures inefficiently, but I've tried more or
>
>less everything I can think of and can't really speed it up. I have
>done 
>some profiling which helped me find problem areas, but I couldn't speed
>
>things up even with that information. I'm thinking I'm just 
>fundamentally using R in a silly way.
>
>I've included code for the different versions. I wrote the python code 
>in a style to make it as clear to R programmers as possible. Thanks a 
>lot! Any help would be greatly appreciated!
>
>Cheers,
>Thomas
>
>
>R code (with two versions depending on commenting):
>
>-----
>
>numbers <- numeric(0)
>for (i in 1:5) {
>     numbers <- c(numbers, sample(1:30000, 10000))
>}
>
>values <- numeric(0)
>for (i in 1:length(numbers)) {
>     values <- append(values, sample(1:10, 1))
>}
>
>            starttime <- Sys.time()
>
>d = list()
>for (i in 1:length(numbers)) {
>     number = toString(numbers[i])
>     value = values[i]
>     if (is.null(d[[number]])) {
>     #if (number %in% names(d)) {
>         d[[number]] <- c(value)
>     } else {
>         d[[number]] <- append(d[[number]], value)
>     }
>}
>
>endtime <- Sys.time()
>
>print(format(endtime - starttime))
>
>-----
>
>uncommented version: "45.64791 secs"
>commented version: "1.423056 mins"
>
>
>
>Another version of R code:
>
>-----
>
>numbers <- numeric(0)
>for (i in 1:5) {
>     numbers <- c(numbers, sample(1:30000, 10000))
>}
>
>values <- numeric(0)
>for (i in 1:length(numbers)) {
>     values <- append(values, sample(1:10, 1))
>}
>
>starttime <- Sys.time()
>
>d = list()
>for (number in unique(numbers)) {
>     d[[toString(number)]] <- numeric(0)
>}
>for (i in 1:length(numbers)) {
>     number = toString(numbers[i])
>     value = values[i]
>     d[[number]] <- append(d[[number]], value)
>}
>
>endtime <- Sys.time()
>
>print(format(endtime - starttime))
>
>-----
>
>"47.15579 secs"
>
>
>
>The python code:
>
>-----
>
>import random
>import time
>
>numbers = []
>for i in range(5):
>     numbers += random.sample(range(30000), 10000)
>
>values = []
>for i in range(len(numbers)):
>     values.append(random.randint(1, 10))
>
>starttime = time.time()
>
>d = {}
>for i in range(len(numbers)):
>     number = numbers[i]
>     value = values[i]
>     if d.has_key(number):
>         d[number].append(value)
>     else:
>         d[number] = [value]
>
>endtime = time.time()
>
>print endtime - starttime, "seconds"
>
>-----
>
>0.123021125793 seconds
>
>______________________________________________
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to