[Rd] segfault / crash when asking for large memory via strrep()
We've had this more general topic on R-help, and also in R-devel recently. There's one case here where I get the feeling R never gets into swapping but more directly aborts possibly from a bug we can more easily fix. Today I've been working (successfully! - not yet committed) at fixing str() for very large strings. In this process, I've found that pc <- function(.) paste(., collapse=".1.2.3.4.5.") p <- function(.) strrep(pc(.), 64L) p(p(p(p(LETTERS produces a (memory related) segmentation fault (aka "crash") very reproducibly and relatively quickly both on my Linux (Fedora 22) desktop and on our Windows server. *** caught segfault *** address 0x7fc52dc89000, cause 'memory not mapped' Traceback: 1: strrep(pc(.), 64L) 2: p(p(p(p(LETTERS 3: system.time(L2 <- p(p(p(p(LETTERS) In the debugger, the symptoms point to the possibility of a bug just in the C parts of strrep() : Program received signal SIGSEGV, Segmentation fault. 0x754d6223 in __strcpy_sse2_unaligned () from /usr/lib64/libc.so.6 Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.6-14.fc22.x86_64 libgcc-5.3.1-6.fc22.x86_64 libgfortran-5.3.1-6.fc22.x86_64 libgomp-5.3.1-6.fc22.x86_64 libicu-54.1-4.fc22.x86_64 libquadmath-5.3.1-6.fc22.x86_64 libstdc++-5.3.1-6.fc22.x86_64 ncurses-libs-5.9-18.20150214.fc22.x86_64 pcre-8.38-4.fc22.x86_64 readline-6.3-5.fc22.x86_64 xz-libs-5.2.0-2.fc22.x86_64 zlib-1.2.8-7.fc22.x86_64 (gdb) bt #0 0x754d6223 in __strcpy_sse2_unaligned () from /usr/lib64/libc.so.6 #1 0x00457def in do_strrep (call=, op=, args=, env=) at ../../../R/src/main/character.c:1658 #2 0x004d6844 in bcEval (body=body@entry=0xd66840, rho=rho@entry=0x45253b8, useCache=useCache@entry=TRUE) at ../../../R/src/main/eval.c:5648 #3 0x004dd240 in Rf_eval (e=0xd66840, rho=0x45253b8) at ../../../R/src/main/eval.c:616 #4 0x004dedaf in Rf_applyClosure (call=call@entry=0x45250a8, op=op@entry=0xd668e8, arglist=0x45251f8, rho=rho@entry=0x4525000, suppliedvars=0xa57188) at ../../../R/src/main/eval.c:1134 #5 0x004dd3b1 in Rf_eval (e=0x45250a8, rho=0x4525000) at ../../../R/src/main/eval.c:732 #6 0x004dedaf in Rf_applyClosure (call=call@entry=0x4525718, op=op@entry=0x4524d28, arglist=0x4524f90, rho=rho@entry=0xa8ea30, suppliedvars=0xa57188) at ../../../R/src/main/eval.c:1134 #7 0x004dd3b1 in Rf_eval (e=0x4525718, rho=0xa8ea30) at ../../../R/src/main/eval.c:732 #8 0x004e0cde in do_set (call=0x4525670, op=0xa61358, args=, rho=0xa8ea30) at ../../../R/src/main/eval.c:2196 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] segfault / crash when asking for large memory via strrep()
That would be because the product nc * ni overflows in cbuf = buf = CallocCharBuf(nc * ni); Since we disallow strings with more than 2^31-1 bytes we could test and reject this. It might be more future-proof to change the declaration of int j, ni, nc; to R_xlen_t j, ni, nc; and let the character allocation code reject, but that would create a memory leak since the Free call isn't reached. This is a problem in any case though, as SET_STRING_ELT(s, is, markKnown(cbuf, STRING_ELT(x, ix))); could throw errors for a number of reasons and then the Free() is not reached. It would be better to use R_alloc or register a cleanup function to call Free on a jump. Best, luke On Wed, 1 Jun 2016, Martin Maechler wrote: We've had this more general topic on R-help, and also in R-devel recently. There's one case here where I get the feeling R never gets into swapping but more directly aborts possibly from a bug we can more easily fix. Today I've been working (successfully! - not yet committed) at fixing str() for very large strings. In this process, I've found that pc <- function(.) paste(., collapse=".1.2.3.4.5.") p <- function(.) strrep(pc(.), 64L) p(p(p(p(LETTERS produces a (memory related) segmentation fault (aka "crash") very reproducibly and relatively quickly both on my Linux (Fedora 22) desktop and on our Windows server. *** caught segfault *** address 0x7fc52dc89000, cause 'memory not mapped' Traceback: 1: strrep(pc(.), 64L) 2: p(p(p(p(LETTERS 3: system.time(L2 <- p(p(p(p(LETTERS) In the debugger, the symptoms point to the possibility of a bug just in the C parts of strrep() : Program received signal SIGSEGV, Segmentation fault. 0x754d6223 in __strcpy_sse2_unaligned () from /usr/lib64/libc.so.6 Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.6-14.fc22.x86_64 libgcc-5.3.1-6.fc22.x86_64 libgfortran-5.3.1-6.fc22.x86_64 libgomp-5.3.1-6.fc22.x86_64 libicu-54.1-4.fc22.x86_64 libquadmath-5.3.1-6.fc22.x86_64 libstdc++-5.3.1-6.fc22.x86_64 ncurses-libs-5.9-18.20150214.fc22.x86_64 pcre-8.38-4.fc22.x86_64 readline-6.3-5.fc22.x86_64 xz-libs-5.2.0-2.fc22.x86_64 zlib-1.2.8-7.fc22.x86_64 (gdb) bt #0 0x754d6223 in __strcpy_sse2_unaligned () from /usr/lib64/libc.so.6 #1 0x00457def in do_strrep (call=, op=, args=, env=) at ../../../R/src/main/character.c:1658 #2 0x004d6844 in bcEval (body=body@entry=0xd66840, rho=rho@entry=0x45253b8, useCache=useCache@entry=TRUE) at ../../../R/src/main/eval.c:5648 #3 0x004dd240 in Rf_eval (e=0xd66840, rho=0x45253b8) at ../../../R/src/main/eval.c:616 #4 0x004dedaf in Rf_applyClosure (call=call@entry=0x45250a8, op=op@entry=0xd668e8, arglist=0x45251f8, rho=rho@entry=0x4525000, suppliedvars=0xa57188) at ../../../R/src/main/eval.c:1134 #5 0x004dd3b1 in Rf_eval (e=0x45250a8, rho=0x4525000) at ../../../R/src/main/eval.c:732 #6 0x004dedaf in Rf_applyClosure (call=call@entry=0x4525718, op=op@entry=0x4524d28, arglist=0x4524f90, rho=rho@entry=0xa8ea30, suppliedvars=0xa57188) at ../../../R/src/main/eval.c:1134 #7 0x004dd3b1 in Rf_eval (e=0x4525718, rho=0xa8ea30) at ../../../R/src/main/eval.c:732 #8 0x004e0cde in do_set (call=0x4525670, op=0xa61358, args=, rho=0xa8ea30) at ../../../R/src/main/eval.c:2196 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] segfault / crash when asking for large memory via strrep()
I've added a size/overflow check before the buffer allocation in R-devel and R-patched. It would be a good idea sometime to review the use of calloc ... free patterns to make sure the ... can't raise an error or otherwise jump and leave the memory pointer dangling. Best, luke On Wed, 1 Jun 2016, luke-tier...@uiowa.edu wrote: That would be because the product nc * ni overflows in cbuf = buf = CallocCharBuf(nc * ni); Since we disallow strings with more than 2^31-1 bytes we could test and reject this. It might be more future-proof to change the declaration of int j, ni, nc; to R_xlen_t j, ni, nc; and let the character allocation code reject, but that would create a memory leak since the Free call isn't reached. This is a problem in any case though, as SET_STRING_ELT(s, is, markKnown(cbuf, STRING_ELT(x, ix))); could throw errors for a number of reasons and then the Free() is not reached. It would be better to use R_alloc or register a cleanup function to call Free on a jump. Best, luke On Wed, 1 Jun 2016, Martin Maechler wrote: We've had this more general topic on R-help, and also in R-devel recently. There's one case here where I get the feeling R never gets into swapping but more directly aborts possibly from a bug we can more easily fix. Today I've been working (successfully! - not yet committed) at fixing str() for very large strings. In this process, I've found that pc <- function(.) paste(., collapse=".1.2.3.4.5.") p <- function(.) strrep(pc(.), 64L) p(p(p(p(LETTERS produces a (memory related) segmentation fault (aka "crash") very reproducibly and relatively quickly both on my Linux (Fedora 22) desktop and on our Windows server. *** caught segfault *** address 0x7fc52dc89000, cause 'memory not mapped' Traceback: 1: strrep(pc(.), 64L) 2: p(p(p(p(LETTERS 3: system.time(L2 <- p(p(p(p(LETTERS) In the debugger, the symptoms point to the possibility of a bug just in the C parts of strrep() : Program received signal SIGSEGV, Segmentation fault. 0x754d6223 in __strcpy_sse2_unaligned () from /usr/lib64/libc.so.6 Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.6-14.fc22.x86_64 libgcc-5.3.1-6.fc22.x86_64 libgfortran-5.3.1-6.fc22.x86_64 libgomp-5.3.1-6.fc22.x86_64 libicu-54.1-4.fc22.x86_64 libquadmath-5.3.1-6.fc22.x86_64 libstdc++-5.3.1-6.fc22.x86_64 ncurses-libs-5.9-18.20150214.fc22.x86_64 pcre-8.38-4.fc22.x86_64 readline-6.3-5.fc22.x86_64 xz-libs-5.2.0-2.fc22.x86_64 zlib-1.2.8-7.fc22.x86_64 (gdb) bt #0 0x754d6223 in __strcpy_sse2_unaligned () from /usr/lib64/libc.so.6 #1 0x00457def in do_strrep (call=, op=out>, args=, env=) at ../../../R/src/main/character.c:1658 #2 0x004d6844 in bcEval (body=body@entry=0xd66840, rho=rho@entry=0x45253b8, useCache=useCache@entry=TRUE) at ../../../R/src/main/eval.c:5648 #3 0x004dd240 in Rf_eval (e=0xd66840, rho=0x45253b8) at ../../../R/src/main/eval.c:616 #4 0x004dedaf in Rf_applyClosure (call=call@entry=0x45250a8, op=op@entry=0xd668e8, arglist=0x45251f8, rho=rho@entry=0x4525000, suppliedvars=0xa57188) at ../../../R/src/main/eval.c:1134 #5 0x004dd3b1 in Rf_eval (e=0x45250a8, rho=0x4525000) at ../../../R/src/main/eval.c:732 #6 0x004dedaf in Rf_applyClosure (call=call@entry=0x4525718, op=op@entry=0x4524d28, arglist=0x4524f90, rho=rho@entry=0xa8ea30, suppliedvars=0xa57188) at ../../../R/src/main/eval.c:1134 #7 0x004dd3b1 in Rf_eval (e=0x4525718, rho=0xa8ea30) at ../../../R/src/main/eval.c:732 #8 0x004e0cde in do_set (call=0x4525670, op=0xa61358, args=, rho=0xa8ea30) at ../../../R/src/main/eval.c:2196 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] [RfC] Family dispersion
Hi, I'd like to hear your opinion about the following proposal to make the computation of dispersion in GLMs more flexible. Dispersion is used in summary.glm; the relevant code chunk with the dispersion calculation is listed below (from glm.R): summary.glm <- function(object, dispersion = NULL, correlation = FALSE, symbolic.cor = FALSE, ...) { est.disp <- FALSE df.r <- object$df.residual if(is.null(dispersion)) # calculate dispersion if needed dispersion <- if(object$family$family %in% c("poisson", "binomial")) 1 else if(df.r > 0) { est.disp <- TRUE if(any(object$weights==0)) warning("observations with zero weight not used for calculating dispersion") sum((object$weights*object$residuals^2)[object$weights > 0])/ df.r } else { est.disp <- TRUE NaN } # ... } Many exponential families have unit dispersion, or can be cast to have unit dispersion, e.g. hypergeometric, negative binomial, and so on. However, summary.glm only assigns unit dispersion to Poisson and binomial families, as the code above indicates. My suggestion is to make this check more general by having a 'dispersion' slot in the family class; for instance, we would have poisson(...)$dispersion = 1 and binomial(...)$dispersion = 1. The updated summary.glm would be: default.dispersion <- function (object, ...) { df.r <- object$df.residual if (df.r > 0) { if (any(object$weights == 0)) warning("observations with zero weight not used for calculating dispersion") sum((object$weights * object$residuals ^ 2)[object$weights > 0]) / df.r } else NaN } summary.glm <- function(object, dispersion = default.dispersion, correlation = FALSE, symbolic.cor = FALSE, ...) { if (!is.null(object$family$dispersion)) # use family dispersion? dispersion <- object$family$dispersion est.disp <- is.function(dispersion) dispersion <- if (est.disp) dispersion(object, ...) else dispersion df.r <- object$df.residual # ... (unchanged code below) } Note that 'dispersion' can be a function taking a glm object or a number (e.g. 1). Here are some examples: R> library(MASS) R> gm <- glm(formula, family=Gamma()) R> summary(gm, dispersion = gamma.dispersion) # ML estimate of dispersion R> set.dispersion <- function (fam, disp) # update family dispersion R> structure(within(unclass(fam), dispersion <- disp), class = "family") R> gm <- glm(formula, family=set.dispersion(Gamma(), gamma.dispersion)) R> summary(gm) # use family dispersion R> Exp <- function (...) set.dispersion(Gamma(...), 1) Thanks in advance for the feedback. Cheers, Luis -- Computers are useless. They can only give you answers. -- Pablo Picasso -- Luis Carvalho Associate Professor Dept. of Mathematics and Statistics Boston University http://math.bu.edu/people/lecarval __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel