Hi, I have a solution based upon findInterval() which depends upon the ordered nature of the 'Day' column. I can't speak to whether or not it is efficient but it is handy. I love the findInterval() function but have often wished it works with look up tables in descending order. This function, find_interval(), is my first reasonable pass at working with descending order.
https://gist.github.com/btupper/5fc6cc4e7d86f39f9e4f ## start source("find_interval.R") x <- structure(list(Day = c(1L, 2L, 3L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L, 14L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L), Value = c(76, 116, 111, 103, 114, 99, 128, 96, 81, 84, 81, 108, 109, 106, 125, 128, 92, 90, 83, 89, 76, 89, 101, 93, 98, 77, 92)), .Names = c("Day", "Value"), row.names = c(NA, -27L), class = "data.frame") TESTDAYS <- data.frame(TestDay = c(4, 11, 15)) x <- x[rev(1:nrow(x)),] ix <- find_interval(TESTDAYS[,'TestDay'], x[,'Day']) TESTDAYS[,'TestValue'] <- x[ix,'Value'] TESTDAYS # TestDay TestValue # 1 4 103 # 2 11 84 # 3 15 109 ### end Will that do the trick with your large dataset? Ben > On Sep 29, 2016, at 9:38 AM, Dennis Fisher <fis...@plessthan.com> wrote: > > R 3.3.1 > OS X > > Colleagues, > > I have two large data frames that I am trying to link efficiently. A small > example is as follows: > > structure(list(Day = c(1L, 2L, 3L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, > 13L, 14L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, > 27L, 28L, 29L, 30L), Value = c(76, 116, 111, 103, 114, 99, 128, > 96, 81, 84, 81, 108, 109, 106, 125, 128, 92, 90, 83, 89, 76, > 89, 101, 93, 98, 77, 92)), .Names = c("Day", "Value"), row.names = c(NA, > -27L), class = "data.frame") > > which becomes: > Day Value > 1 1 76 > 2 2 116 > 3 3 111 > 4 5 103 > 5 6 114 > 6 7 99 > 7 8 128 > 8 9 96 > 9 10 81 > 10 11 84 > 11 13 81 > 12 14 108 > 13 16 109 > 14 17 106 > 15 18 125 > 16 19 128 > 17 20 92 > 18 21 90 > 19 22 83 > 20 23 89 > 21 24 76 > 22 25 89 > 23 26 101 > 24 27 93 > 25 28 98 > 26 29 77 > 27 30 92 > > The second dataframe is merely: > TESTDAYS <- data.frame(TestDay = c(4, 11, 15)) > > For each row in the second dataframe, I would like to identify the first row > in the first dataframe in which Day is >= TestDay. > For example, for TestDay == 4, Day would equal 5. I would then append the > corresponding “Value” in the TestValue column > The result would be: > TestDay TestValue > 1 4 103 > 2 11 84 > 3 15 109 > > I can accomplish this with brute force but I suspect that there is some > clever day to vectorize this. Any help would be appreciated. > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone / Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com <http://www.plessthan.com/> > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.