You are right, guys, merge is working. Somehow I was under the erroneous impression that because the second data frame (myinfo) contains no column 'myid' merge will not work. Below is the cleaner code and comparison:
######################################### ### Example with smaller data frames ######################################### set.seed(123) mydata <- data.frame(myid = 1001:1020, version = sample(1:10, 20, replace = T)) head(mydata) table(mydata$version) set.seed(12) myinfo <- data.frame(version = sort(rep(1:10, 5)), a = rnorm(50), b = rnorm(50), c = rnorm(50), d = rnorm(50)) head(myinfo, 40) table(myinfo$version) ###---------------------------------------- ### METHOD 1 - Looping through each id of mydata and grabbing ### all columns of myinfo for the corresponding 'version': # Create placeholder list for the results: result <- split(mydata[c("myid", "version")], f = list(mydata$myid)) length(result) (result)[1:3] # Looping through each element of 'result': for(i in 1:length(result)){ id <- result[[i]]$myid result[[i]] <- myinfo[myinfo$version == result[[i]]$version, ] result[[i]]$myid <- id result[[i]] <- result[[i]][c(6, 1:5)] } result <- do.call(rbind, result) result.order <- arrange(result, myid, version, a, b, c, d) head(result.order) # This is the desired result ###---------------------------------------- ### METHOD 2 - merge my.merge <- merge(myinfo, mydata, by="version") names(my.merge) result2 <- my.merge[,c("myid", "version", "a", "b", "c", "d")] names(result2) result2.order <- arrange(result2, myid, version, a, b, c, d) dim(result2.order) head(result2.order) # Same result? all.equal(result.order, result2.order) On Tue, Dec 22, 2015 at 3:34 PM, Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com> wrote: > I know I am overwriting. > merge doesn't solve it because each version in mydata is given to more > than one id. Hence, I thought I can't merge by version. > I am not sure how to answer the question about "the problem". > I described the current state and the desired state. If possible, I'd > like to get from the current state to the desired state faster than > when using a loop. > > On Tue, Dec 22, 2015 at 2:26 PM, jim holtman <jholt...@gmail.com> wrote: >> You seem to be saving 'myid' and then overwriting it with the last >> statement: >> >> result[[i]] <- result[[i]][c(5, 1:4)] >> >> Why doesn't 'merge' work for you? I tried it on your data, and seem to get >> back the same number of rows; may not be in the same order, but the content >> looks the same, and it does have 'myid' on it. >> >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> On Tue, Dec 22, 2015 at 12:27 PM, Dimitri Liakhovitski >> <dimitri.liakhovit...@gmail.com> wrote: >>> >>> Hello! >>> I have a solution for my task that is based on a loop. However, it's >>> too slow for my real-life problem that is much larger in scope. >>> However, I cannot use merge. Any advice on how to do it faster? >>> Thanks a lot for any hint on how to speed it up! >>> >>> # I have 'mydata' data frame: >>> set.seed(123) >>> mydata <- data.frame(myid = 1001:1100, >>> version = sample(1:20, 100, replace = T)) >>> head(mydata) >>> table(mydata$version) >>> >>> # I have 'myinfo' data frame that contains information for each 'version': >>> set.seed(12) >>> myinfo <- data.frame(version = sort(rep(1:20, 30)), a = rnorm(60), b = >>> rnorm(60), >>> c = rnorm(60), d = rnorm(60)) >>> head(myinfo, 40) >>> >>> ### MY SOLUTION WITH A LOOP: >>> ### Looping through each id of mydata and grabbing >>> ### all columns from 'myinfo' for the corresponding 'version': >>> >>> # 1. Creating placeholder list for the results: >>> result <- split(mydata[c("myid", "version")], f = list(mydata$myid)) >>> length(result) >>> (result)[1:3] >>> >>> >>> # 2. Looping through each element of 'result': >>> for(i in 1:length(result)){ >>> id <- result[[i]]$myid >>> result[[i]] <- myinfo[myinfo$version == result[[i]]$version, ] >>> result[[i]]$myid <- id >>> result[[i]] <- result[[i]][c(5, 1:4)] >>> } >>> result <- do.call(rbind, result) >>> head(result) # This is the desired result >>> >>> -- >>> Dimitri Liakhovitski >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > > > -- > Dimitri Liakhovitski -- Dimitri Liakhovitski ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.