Re: [Rd] dgTMatrix Segmentation Fault
Good day, Thanks to handy hints from Martin Morgan, I ran R under gdb and checked for any numeric overflow. We pinpointed the cause: (gdb) info locals i = 0 j = 10738 m = 20 n = 5 ans = 0x5b332790 aa = 0x5b3327c0 There is a line of C code in dgeMatrix.c for (i = 0; i < m; i++) aa[i] += xx[i + j * m]; i + j * m are all int, and overflow (lldb) print 0 + 10738 * 20 (int) $5 = -2147367296 So, either the code should check that this doesn't occur, or be adjusted to allow for large indexes. If anyone is interested, this is in the context of single-cell ATAC-seq data, which typically has about 20 genomic regions (rows) and perhaps 10 biological cells (columns). -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] svd For Large Matrix
Good day, I have a real scenario involving 45 million biological cells (samples) and 60 proteins (variables) which leads to a segmentation fault for svd. I thought this might be a good example of why it might benefit from a long vector upgrade. test <- matrix(rnorm(4500*60), ncol = 60) testSVD <- svd(test) *** caught segfault *** address 0x7fe93514d618, cause 'memory not mapped' Traceback: 1: La.svd(x, nu, nv) 2: svd(test) -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] svd For Large Matrix
Good day, Ah, I was confident it wouldn't be environment-specific but it is. My environment is R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3 LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3 It crashes at about 180 GB RAM usage. The server has 1024 GB physical RAM in it. Modestly downsampling to 30 million cells avoids the segmentation fault. The segmentation fault originates from BLAS Program received signal SIGSEGV, Segmentation fault. 0x77649c10 in ATL_dgecopy () from /usr/lib/x86_64-linux-gnu/libblas.so.3 -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Data Frame Conversion and Table Input
Good day, as.data.frame is documented on ?table and on ?as.data.frame (for list and matrix inputs). For inputs of list type and matrix type, there is an argument optional, which allows preservation of column names. If the input is a table, there is no such option. Could the API be made consistent for base data types? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Combinations and Permutations
Good day, In utils, there is a function named combn. It would seem complementary for utils to also offer permutations because of how closely mathematically related they are to each other. Could permutations be added to save on a package dependency if developing a package? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] save.image Non-responsive to Interrupt
Hello, Could save.image() be redesigned so that it promptly responds to Ctrl+C? It prevents the command line from being used for a number of hours if the contents of the workspace are large. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Subset has No Examples for Vector Data
Hello, Could the documentation page for subset gain an example of how to use it for something other than a data frame or matrix? I arrived at > random <- LETTERS[rpois(100, 10)] > subset(table(random), x > 10) named integer(0) I expected a part of the table to be returned rather than an empty vector. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] confint Attempts to Use All Server CPUs by Default
Hello, Would a less resource-intensive value, such as 1, be a safer default CPU value for confint? I noticed excessive CPU usage on a I.T. administrator-managed server which was being three-quarters used by another staff member when the confidence interval calculation in an R Markdown document suddenly changed from two seconds to ninety seconds because of competition for CPUs between users. Also, there is no mention of such parallel processing in ?confint, so it was not clear at first where to look for performance degradation. It could at least be described in the manual page so that users would know that export OPENBLAS_NUM_THREADS=1 is a solution. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] confint Attempts to Use All Server CPUs by Default
Hello, It is from the stats package and applied to the output of logistic regression as implemented by glm with setting family = "binomial". So, it is a base package and not an add-on CRAN package. I shall recommend linking to FlexiBLAS to our system administrator. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Changes in the survival package (long)
Good day, It is impressive to see such sustained package maintenance. -- Dr. Dario Strbenac Bioinformatics Research Associate School of Mathematics and Statistics, University of Sydney Camperdown N.S.W. 2050, Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] stats glm Response Format Ambiguity
Hello, Could there be clarification added to glm's documentation? In contrast, glmnet leaves no ambiguity about what it expects for response. glm: y: is a vector of observations of length n glmnet: y: For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class). For family="multinomial", can be a nc>=2 level factor, or a matrix with nc columns of counts or proportions. For either "binomial" or "multinomial", if y is presented as a vector, it will be coerced into a factor. "a vector of observations" doesn't really narrow it down much. The warning emitted when y a is vector of proportions isn't particularly informative, either. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel