Thanks to all for the help everyone! For the moment I'll stick with
Bill's solution, but I'll check out the other recommendations as well.
Regarding the issue of slow looks ups for lists, are there any hash map
implementations in R that are faster? I like using fairly simple logic
and data structures when prototyping and then only optimize code when
and where it's necessary which is why I'm curious about these basic objects.
On another note, is there a vector style implementation that changes the
vectors in place? If I'm not mistaken, the append operation creates and
returns a new vector each time which is line with the functional nature
of R. If there were some way to have it mutable, it could be much
faster. This is fairly standard in many languages. Behind the scenes
memory is allocated at say 2 times the current size so that you only
need log(n) extensions when building up a vector like this. Are there
any such equivalents in R? I presume that lists are mutable (am I
wrong?), but they seem to have the lookup slowdown problem.
Again thanks a lot!
Cheers,
Thomas
On 10/30/2014 12:05 PM, William Dunlap wrote:
Repeatedly extending vectors takes a lot of time. You can do what you want with
d2 <- split(values, factor(numbers, levels=unique(numbers)))
If you would like the labels on d2 to be in numeric order then you can
simplify that to
d3 <- split(values, numbers)
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, Oct 30, 2014 at 8:17 AM, Thomas Nyberg <tomnyb...@gmail.com> wrote:
Hello,
I want to do the following: Given a set of (number, value) pairs, I want to
create a list l so that l[[toString(number)]] returns the vector of values
associated to that number. It is hundreds of times slower than the
equivalent that I would write in python. I'm pretty new to R so I bet I'm
using its data structures inefficiently, but I've tried more or less
everything I can think of and can't really speed it up. I have done some
profiling which helped me find problem areas, but I couldn't speed things up
even with that information. I'm thinking I'm just fundamentally using R in a
silly way.
I've included code for the different versions. I wrote the python code in a
style to make it as clear to R programmers as possible. Thanks a lot! Any
help would be greatly appreciated!
Cheers,
Thomas
R code (with two versions depending on commenting):
-----
numbers <- numeric(0)
for (i in 1:5) {
numbers <- c(numbers, sample(1:30000, 10000))
}
values <- numeric(0)
for (i in 1:length(numbers)) {
values <- append(values, sample(1:10, 1))
}
starttime <- Sys.time()
d = list()
for (i in 1:length(numbers)) {
number = toString(numbers[i])
value = values[i]
if (is.null(d[[number]])) {
#if (number %in% names(d)) {
d[[number]] <- c(value)
} else {
d[[number]] <- append(d[[number]], value)
}
}
endtime <- Sys.time()
print(format(endtime - starttime))
-----
uncommented version: "45.64791 secs"
commented version: "1.423056 mins"
Another version of R code:
-----
numbers <- numeric(0)
for (i in 1:5) {
numbers <- c(numbers, sample(1:30000, 10000))
}
values <- numeric(0)
for (i in 1:length(numbers)) {
values <- append(values, sample(1:10, 1))
}
starttime <- Sys.time()
d = list()
for (number in unique(numbers)) {
d[[toString(number)]] <- numeric(0)
}
for (i in 1:length(numbers)) {
number = toString(numbers[i])
value = values[i]
d[[number]] <- append(d[[number]], value)
}
endtime <- Sys.time()
print(format(endtime - starttime))
-----
"47.15579 secs"
The python code:
-----
import random
import time
numbers = []
for i in range(5):
numbers += random.sample(range(30000), 10000)
values = []
for i in range(len(numbers)):
values.append(random.randint(1, 10))
starttime = time.time()
d = {}
for i in range(len(numbers)):
number = numbers[i]
value = values[i]
if d.has_key(number):
d[number].append(value)
else:
d[number] = [value]
endtime = time.time()
print endtime - starttime, "seconds"
-----
0.123021125793 seconds
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.