I appreciate the timing, so much so I changed the code to show the issue. It is a problem of scale.
roll_lm probably has a heavy start-up cost but otherwise completely out-performs those other versions at scale. I suspect you are timing the nearly constant time start-up cost in small data. I did give code to paint a picture, but it was just cartoon code lifted from stackexchange. If you want to characterize the real problem it is closer to: 30 day rolling windows on 24 daily (by hour) measurements for 5 years with 24+7 -1 dummy predictor variables and finally you need to do this for 300 sets of data. Pseudo-code is closer to what follows and roll_lm can handle that input in a timely manner. You can do it with lm.fit, but you need to spend a lot of time waiting. The issue of accuracy needs a follow-up check. Not sure why it would be different. Worth a check on that. Thanks, Jeremiah library(rbenchmark) N = 30*24*12*5 window = 30*24 npred = 15 #15 chosen arbitrarily... set.seed(1) library(zoo) library(RcppArmadillo) library(roll) x = matrix(rnorm(N*(npred+1)), ncol = npred+1) colnames(x) <- c("y", paste0("x", 1:npred)) z <- zoo(x) benchmark( roll_lm = roll_lm(coredata(z[, 1, drop = F]), coredata(z[, -1, drop = F]), window, center = FALSE), replications=3) Which comes out as: test replications elapsed relative user.self sys.self user.child sys.child 1 roll_lm 3 6.273 1 38.312 0.654 0 0 ## You arn't going to get that below... benchmark(fastLm = rollapplyr(z, width = window, function(x) coef(fastLm(cbind(1, x[, -1]), x[, 1])), by.column = FALSE), lm = rollapplyr(z, width = window, function(x) coef(lm(y ~ ., data = as.data.frame(x))), by.column = FALSE), replications=3) On Thu, Jul 21, 2016 at 1:28 PM, Gabor Grothendieck <ggrothendi...@gmail.com > wrote: > I would be careful about making assumptions regarding what is faster. > Performance tends to be nonintuitive. > > When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example > you provided rollapply/fastLm was three times faster than roll_lm. Of > course this could change with data of different dimensions but it > would be worthwhile to do actual benchmarks before making assumptions. > > I also noticed that roll_lm did not give the same coefficients as the > other two. > > set.seed(1) > library(zoo) > library(RcppArmadillo) > library(roll) > z <- zoo(matrix(rnorm(10), ncol = 2)) > colnames(z) <- c("y", "x") > > ## rolling regression of width 4 > library(rbenchmark) > benchmark(fastLm = rollapplyr(z, width = 4, > function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])), > by.column = FALSE), > lm = rollapplyr(z, width = 4, > function(x) coef(lm(y ~ x, data = as.data.frame(x))), > by.column = FALSE), > roll_lm = roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop = > F]), 4, > center = FALSE))[1:4] > > > test replications elapsed relative > 1 fastLm 100 0.22 1.000 > 2 lm 100 0.72 3.273 > 3 roll_lm 100 0.64 2.909 > > On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds > <roundsjerem...@gmail.com> wrote: > > Thanks all. roll::roll_lm was essentially what I wanted. I think > maybe > > I would prefer it to have options to return a few more things, but it is > > the coefficients, and the remaining statistics you might want can be > > calculated fast enough from there. > > > > > > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis < > achim.zeil...@uibk.ac.at> > > wrote: > > > >> Jeremiah, > >> > >> for this purpose there are the "roll" and "RcppRoll" packages. Both use > >> Rcpp and the former also provides rolling lm models. The latter has a > >> generic interface that let's you define your own function. > >> > >> One thing to pay attention to, though, is the numerical reliability. > >> Especially on large time series with relatively short windows there is a > >> good chance of encountering numerically challenging situations. The QR > >> decomposition used by lm is fairly robust while other more > straightforward > >> matrix multiplications may not be. This should be kept in mind when > writing > >> your own Rcpp code for plugging it into RcppRoll. > >> > >> But I haven't check what the roll package does and how reliable that > is... > >> > >> hth, > >> Z > >> > >> > >> On Thu, 21 Jul 2016, jeremiah rounds wrote: > >> > >> Hi, > >>> > >>> A not unusual task is performing a multiple regression in a rolling > window > >>> on a time-series. A standard piece of advice for doing in R is > >>> something > >>> like the code that follows at the end of the email. I am currently > using > >>> an "embed" variant of that code and that piece of advice is out there > too. > >>> > >>> But, it occurs to me that for such an easily specified matrix operation > >>> standard R code is really slow. rollapply constantly returns to R > >>> interpreter at each window step for a new lm. All lm is at its heart > is > >>> (X^t X)^(-1) * Xy, and if you think about doing that with Rcpp in > rolling > >>> window you are just incrementing a counter and peeling off rows (or > >>> columns > >>> of X and y) of a particular window size, and following that up with > some > >>> matrix multiplication in a loop. The psuedo-code for that Rcpp > >>> practically writes itself and you might want a wrapper of something > like: > >>> rolling_lm (y=y, x=x, width=4). > >>> > >>> My question is this: has any of the thousands of R packages out there > >>> published anything like that. Rolling window multiple regressions that > >>> stay in C/C++ until the rolling window completes? No sense and > writing it > >>> if it exist. > >>> > >>> > >>> Thanks, > >>> Jeremiah > >>> > >>> Standard (slow) advice for "rolling window regression" follows: > >>> > >>> > >>> set.seed(1) > >>> z <- zoo(matrix(rnorm(10), ncol = 2)) > >>> colnames(z) <- c("y", "x") > >>> > >>> ## rolling regression of width 4 > >>> rollapply(z, width = 4, > >>> function(x) coef(lm(y ~ x, data = as.data.frame(x))), > >>> by.column = FALSE, align = "right") > >>> > >>> ## result is identical to > >>> coef(lm(y ~ x, data = z[1:4,])) > >>> coef(lm(y ~ x, data = z[2:5,])) > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.