Bill beat me to it, I was just about to post the same thing. The R split version is still slower than python on my system, but the times are now on the same order of magnitude, about a 10th of a second in both cases.
You can also speed up the set-up part by sampling all at once instead of repeatedly, e.g., sample(1:10, length(numbers2), replace=TRUE) instead of values <- numeric(0) for (i in 1:length(numbers)) { values <- append(values, sample(1:10, 1)) } Best, Ista On Thu, Oct 30, 2014 at 12:05 PM, William Dunlap <wdun...@tibco.com> wrote: > Repeatedly extending vectors takes a lot of time. You can do what you want > with > d2 <- split(values, factor(numbers, levels=unique(numbers))) > If you would like the labels on d2 to be in numeric order then you can > simplify that to > d3 <- split(values, numbers) > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Thu, Oct 30, 2014 at 8:17 AM, Thomas Nyberg <tomnyb...@gmail.com> wrote: >> Hello, >> >> I want to do the following: Given a set of (number, value) pairs, I want to >> create a list l so that l[[toString(number)]] returns the vector of values >> associated to that number. It is hundreds of times slower than the >> equivalent that I would write in python. I'm pretty new to R so I bet I'm >> using its data structures inefficiently, but I've tried more or less >> everything I can think of and can't really speed it up. I have done some >> profiling which helped me find problem areas, but I couldn't speed things up >> even with that information. I'm thinking I'm just fundamentally using R in a >> silly way. >> >> I've included code for the different versions. I wrote the python code in a >> style to make it as clear to R programmers as possible. Thanks a lot! Any >> help would be greatly appreciated! >> >> Cheers, >> Thomas >> >> >> R code (with two versions depending on commenting): >> >> ----- >> >> numbers <- numeric(0) >> for (i in 1:5) { >> numbers <- c(numbers, sample(1:30000, 10000)) >> } >> >> values <- numeric(0) >> for (i in 1:length(numbers)) { >> values <- append(values, sample(1:10, 1)) >> } >> >> starttime <- Sys.time() >> >> d = list() >> for (i in 1:length(numbers)) { >> number = toString(numbers[i]) >> value = values[i] >> if (is.null(d[[number]])) { >> #if (number %in% names(d)) { >> d[[number]] <- c(value) >> } else { >> d[[number]] <- append(d[[number]], value) >> } >> } >> >> endtime <- Sys.time() >> >> print(format(endtime - starttime)) >> >> ----- >> >> uncommented version: "45.64791 secs" >> commented version: "1.423056 mins" >> >> >> >> Another version of R code: >> >> ----- >> >> numbers <- numeric(0) >> for (i in 1:5) { >> numbers <- c(numbers, sample(1:30000, 10000)) >> } >> >> values <- numeric(0) >> for (i in 1:length(numbers)) { >> values <- append(values, sample(1:10, 1)) >> } >> >> starttime <- Sys.time() >> >> d = list() >> for (number in unique(numbers)) { >> d[[toString(number)]] <- numeric(0) >> } >> for (i in 1:length(numbers)) { >> number = toString(numbers[i]) >> value = values[i] >> d[[number]] <- append(d[[number]], value) >> } >> >> endtime <- Sys.time() >> >> print(format(endtime - starttime)) >> >> ----- >> >> "47.15579 secs" >> >> >> >> The python code: >> >> ----- >> >> import random >> import time >> >> numbers = [] >> for i in range(5): >> numbers += random.sample(range(30000), 10000) >> >> values = [] >> for i in range(len(numbers)): >> values.append(random.randint(1, 10)) >> >> starttime = time.time() >> >> d = {} >> for i in range(len(numbers)): >> number = numbers[i] >> value = values[i] >> if d.has_key(number): >> d[number].append(value) >> else: >> d[number] = [value] >> >> endtime = time.time() >> >> print endtime - starttime, "seconds" >> >> ----- >> >> 0.123021125793 seconds >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.