You can try using an environment instead of a list. Bill Dunlap TIBCO Software wdunlap tibco.com
On Thu, Oct 30, 2014 at 10:02 AM, Thomas Nyberg <tomnyb...@gmail.com> wrote: > Thanks to all for the help everyone! For the moment I'll stick with Bill's > solution, but I'll check out the other recommendations as well. > > Regarding the issue of slow looks ups for lists, are there any hash map > implementations in R that are faster? I like using fairly simple logic and > data structures when prototyping and then only optimize code when and where > it's necessary which is why I'm curious about these basic objects. > > On another note, is there a vector style implementation that changes the > vectors in place? If I'm not mistaken, the append operation creates and > returns a new vector each time which is line with the functional nature of > R. If there were some way to have it mutable, it could be much faster. This > is fairly standard in many languages. Behind the scenes memory is allocated > at say 2 times the current size so that you only need log(n) extensions when > building up a vector like this. Are there any such equivalents in R? I > presume that lists are mutable (am I wrong?), but they seem to have the > lookup slowdown problem. > > Again thanks a lot! > > Cheers, > Thomas > > > On 10/30/2014 12:05 PM, William Dunlap wrote: >> >> Repeatedly extending vectors takes a lot of time. You can do what you >> want with >> d2 <- split(values, factor(numbers, levels=unique(numbers))) >> If you would like the labels on d2 to be in numeric order then you can >> simplify that to >> d3 <- split(values, numbers) >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> >> On Thu, Oct 30, 2014 at 8:17 AM, Thomas Nyberg <tomnyb...@gmail.com> >> wrote: >>> >>> Hello, >>> >>> I want to do the following: Given a set of (number, value) pairs, I want >>> to >>> create a list l so that l[[toString(number)]] returns the vector of >>> values >>> associated to that number. It is hundreds of times slower than the >>> equivalent that I would write in python. I'm pretty new to R so I bet I'm >>> using its data structures inefficiently, but I've tried more or less >>> everything I can think of and can't really speed it up. I have done some >>> profiling which helped me find problem areas, but I couldn't speed things >>> up >>> even with that information. I'm thinking I'm just fundamentally using R >>> in a >>> silly way. >>> >>> I've included code for the different versions. I wrote the python code in >>> a >>> style to make it as clear to R programmers as possible. Thanks a lot! Any >>> help would be greatly appreciated! >>> >>> Cheers, >>> Thomas >>> >>> >>> R code (with two versions depending on commenting): >>> >>> ----- >>> >>> numbers <- numeric(0) >>> for (i in 1:5) { >>> numbers <- c(numbers, sample(1:30000, 10000)) >>> } >>> >>> values <- numeric(0) >>> for (i in 1:length(numbers)) { >>> values <- append(values, sample(1:10, 1)) >>> } >>> >>> starttime <- Sys.time() >>> >>> d = list() >>> for (i in 1:length(numbers)) { >>> number = toString(numbers[i]) >>> value = values[i] >>> if (is.null(d[[number]])) { >>> #if (number %in% names(d)) { >>> d[[number]] <- c(value) >>> } else { >>> d[[number]] <- append(d[[number]], value) >>> } >>> } >>> >>> endtime <- Sys.time() >>> >>> print(format(endtime - starttime)) >>> >>> ----- >>> >>> uncommented version: "45.64791 secs" >>> commented version: "1.423056 mins" >>> >>> >>> >>> Another version of R code: >>> >>> ----- >>> >>> numbers <- numeric(0) >>> for (i in 1:5) { >>> numbers <- c(numbers, sample(1:30000, 10000)) >>> } >>> >>> values <- numeric(0) >>> for (i in 1:length(numbers)) { >>> values <- append(values, sample(1:10, 1)) >>> } >>> >>> starttime <- Sys.time() >>> >>> d = list() >>> for (number in unique(numbers)) { >>> d[[toString(number)]] <- numeric(0) >>> } >>> for (i in 1:length(numbers)) { >>> number = toString(numbers[i]) >>> value = values[i] >>> d[[number]] <- append(d[[number]], value) >>> } >>> >>> endtime <- Sys.time() >>> >>> print(format(endtime - starttime)) >>> >>> ----- >>> >>> "47.15579 secs" >>> >>> >>> >>> The python code: >>> >>> ----- >>> >>> import random >>> import time >>> >>> numbers = [] >>> for i in range(5): >>> numbers += random.sample(range(30000), 10000) >>> >>> values = [] >>> for i in range(len(numbers)): >>> values.append(random.randint(1, 10)) >>> >>> starttime = time.time() >>> >>> d = {} >>> for i in range(len(numbers)): >>> number = numbers[i] >>> value = values[i] >>> if d.has_key(number): >>> d[number].append(value) >>> else: >>> d[number] = [value] >>> >>> endtime = time.time() >>> >>> print endtime - starttime, "seconds" >>> >>> ----- >>> >>> 0.123021125793 seconds >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.