Hello,

I want to do the following: Given a set of (number, value) pairs, I want to create a list l so that l[[toString(number)]] returns the vector of values associated to that number. It is hundreds of times slower than the equivalent that I would write in python. I'm pretty new to R so I bet I'm using its data structures inefficiently, but I've tried more or less everything I can think of and can't really speed it up. I have done some profiling which helped me find problem areas, but I couldn't speed things up even with that information. I'm thinking I'm just fundamentally using R in a silly way.

I've included code for the different versions. I wrote the python code in a style to make it as clear to R programmers as possible. Thanks a lot! Any help would be greatly appreciated!

Cheers,
Thomas


R code (with two versions depending on commenting):

-----

numbers <- numeric(0)
for (i in 1:5) {
    numbers <- c(numbers, sample(1:30000, 10000))
}

values <- numeric(0)
for (i in 1:length(numbers)) {
    values <- append(values, sample(1:10, 1))
}

           starttime <- Sys.time()

d = list()
for (i in 1:length(numbers)) {
    number = toString(numbers[i])
    value = values[i]
    if (is.null(d[[number]])) {
    #if (number %in% names(d)) {
        d[[number]] <- c(value)
    } else {
        d[[number]] <- append(d[[number]], value)
    }
}

endtime <- Sys.time()

print(format(endtime - starttime))

-----

uncommented version: "45.64791 secs"
commented version: "1.423056 mins"



Another version of R code:

-----

numbers <- numeric(0)
for (i in 1:5) {
    numbers <- c(numbers, sample(1:30000, 10000))
}

values <- numeric(0)
for (i in 1:length(numbers)) {
    values <- append(values, sample(1:10, 1))
}

starttime <- Sys.time()

d = list()
for (number in unique(numbers)) {
    d[[toString(number)]] <- numeric(0)
}
for (i in 1:length(numbers)) {
    number = toString(numbers[i])
    value = values[i]
    d[[number]] <- append(d[[number]], value)
}

endtime <- Sys.time()

print(format(endtime - starttime))

-----

"47.15579 secs"



The python code:

-----

import random
import time

numbers = []
for i in range(5):
    numbers += random.sample(range(30000), 10000)

values = []
for i in range(len(numbers)):
    values.append(random.randint(1, 10))

starttime = time.time()

d = {}
for i in range(len(numbers)):
    number = numbers[i]
    value = values[i]
    if d.has_key(number):
        d[number].append(value)
    else:
        d[number] = [value]

endtime = time.time()

print endtime - starttime, "seconds"

-----

0.123021125793 seconds

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to