Hi Martin, I think this is a general problem with any function that does dispatch on ... - I think for well-defined behaviour you always need to dispatch on pairs, folding/reducing (like in your code) to get a single value. The downside of this approach is obviously performance: for n inputs, you need n - 1 temporary objects, which is going to hamper performance (which is problematic for rbind since it's not uncommon to have lists of thousands of small data frames). I don't see anyway around this, except maybe you just have to have specialised variants (like dplyr::bind_rows() and data.table::rbindlist()) that don't do dispatch.
I've cc'd Michael since he might have missed the original thread, and I suspect he's probably thought more about this than me. Hadley On Tue, May 10, 2016 at 5:43 AM, Martin Maechler <maech...@stat.math.ethz.ch> wrote: > This was originally a bug report about Matrix, > > https://r-forge.r-project.org/tracker/?func=detail&atid=294&aid=6325&group_id=61 > but the bug is rather a "design" bug in R, or a limitation. > > This e-mail is a report of the status quo as I see it, and > call for comments, sugguests, help/hints for workarounds, > or even a suggestion for a programming task helping R core to > amend the status {re-writing the methods:::cbind and ...:rbind functions}: > > > If you read ?rbind carefully, you may have learned that rbind() and > cbind() are able to deal with S4 "matrix-like" objects, via the > hidden methods:::rbind / methods:::cbind functions > where these recursively build on appropriate S4 methods for > rbind2() / cbind2(). > > That is how cbind() and rbind() work nowadays for Matrix-package > matrices. > > However, there is problem lurking from the above paragraph, and > for experienced programmers / computer scientists that may even > be obvious: recursion. > > A simple MRE (minimal reproducible example) for the problem seen with Matrix: > > S <- sparseMatrix(i=1:4, j=9:6, x=pi*1:4) > n <- 410 # it succeeds for 407 -- sometimes even with 408 > X <- replicate(n, S, simplify = FALSE) > Xb <- do.call("rbind", X) > ## -> Error in as(x, "CsparseMatrix") : node stack overflow > > But as I said, that's not really the Matrix package. A small > reproducible example, not involving the Matrix package at all: > > MM <- setClass("mmatrix", contains="matrix") > setMethod("rbind2", signature(x="mmatrix", y="mmatrix"), > function(x,y) as(base::rbind(unclass(x), unclass(y)), "mmatrix")) > > (m5 <- MM(diag(5))) > m45 <- MM(matrix(1:20, 4,5)) > rbind(m5, m45) # fine > > ## fine with 400 : > mL.4c <- replicate(400, m45, simplify=FALSE) > mmm4 <- do.call(rbind, mL.4c) > stopifnot(is(mmm4, "mmatrix"), dim(mmm4) == c(1600L, 5L)) > ## not ok with 410 : > mL.410 <- replicate(410, m45, simplify=FALSE) > mmm4 <- do.call(rbind, mL.410) > ## Error in getExportedValue(pkg, name) (from #2) : node stack overflow > > and the "node stack overflow" is not too helpful. > Unfortunately, this is more than one problem, the first one being > that recursive function calls nowadays often end with this > "node stack overflow" error, rather than the somewhat more understandable > error message, namely > > Error: evaluation nested too deeply: infinite recursion / > options(expressions=)? > > And even worse, that nowadays increasing the option to a higher number N, > options(expressions = N) > > does not help at all in this situation, once the code is byte > compiled.... which of course is true for everything in base R. > > But that is basically only the reason for the not-so-helpful > error message (and that raising 'expressions' does not help!), > but the underlying problem is somewhat harder, namely the full > setup the S4-serving methods:::rbind() {and ...cbind()} using > recursion (in the way it does) at all. > > There is a simple, in my eyes ugly workaround which also will not scale well, > but works fine and fast enough for the above example: > > ## Simple ugly workaround .. but well fast enough : >> system.time({ > + r <- mL.410[[1]] > + for(i in seq_along(mL.410)[-1]) > + r <- rbind2(r, mL.410[[i]]) > + }) > user system elapsed > 0.083 0.000 0.083 >> dim(r) > [1] 1640 5 >> > > This should help Ben (the OP of the Matrix bug), and may be > something like that should also guide on how to re-write > the methods:::rbind() / methods:::cbind() in a non-recursive > fashion ? > > Thank you for your thoughts. > > Martin Maechler > ETH Zurich > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- http://hadley.nz ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel