Re: [Rd] dgTMatrix Segmentation Fault

2021-06-09 Thread Dario Strbenac via R-devel
Good day,

Thanks to handy hints from Martin Morgan, I ran R under gdb and checked for any 
numeric overflow. We pinpointed the cause:

(gdb) info locals
i = 0
j = 10738
m = 20
n = 5
ans = 0x5b332790
aa = 0x5b3327c0

There is a line of C code in dgeMatrix.c for (i = 0; i < m; i++) aa[i] += xx[i 
+ j * m];

i  + j * m are all int, and overflow
(lldb) print 0 + 10738 * 20
(int) $5 = -2147367296

So, either the code should check that this doesn't occur, or be adjusted to 
allow for large indexes.

If anyone is interested, this is in the context of single-cell ATAC-seq data, 
which typically has about 20 genomic regions (rows) and perhaps 10 
biological cells (columns).

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] svd For Large Matrix

2021-08-13 Thread Dario Strbenac via R-devel
Good day,

I have a real scenario involving 45 million biological cells (samples) and 60 
proteins (variables) which leads to a segmentation fault for svd. I thought 
this might be a good example of why it might benefit from a long vector upgrade.

test <- matrix(rnorm(4500*60), ncol = 60)
testSVD <- svd(test)

 *** caught segfault ***
address 0x7fe93514d618, cause 'memory not mapped'

Traceback:
 1: La.svd(x, nu, nv)
 2: svd(test)

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] svd For Large Matrix

2021-08-13 Thread Dario Strbenac via R-devel
Good day,

Ah, I was confident it wouldn't be environment-specific but it is. My 
environment is

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

It crashes at about 180 GB RAM usage. The server has 1024 GB physical RAM in 
it. Modestly downsampling to 30 million cells avoids the segmentation fault. 
The segmentation fault originates from BLAS

Program received signal SIGSEGV, Segmentation fault.
0x77649c10 in ATL_dgecopy () from /usr/lib/x86_64-linux-gnu/libblas.so.3

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Data Frame Conversion and Table Input

2021-11-05 Thread Dario Strbenac via R-devel
Good day,

as.data.frame is documented on ?table and on ?as.data.frame (for list and 
matrix inputs). For inputs of list type and matrix type, there is an argument 
optional, which allows preservation of column names. If the input is a table, 
there is no such option. Could the API be made consistent for base data types?

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Combinations and Permutations

2023-01-13 Thread Dario Strbenac via R-devel
Good day,

In utils, there is a function named combn. It would seem complementary for 
utils to also offer permutations because of how closely mathematically related 
they are to each other. Could permutations be added to save on a package 
dependency if developing a package?

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] save.image Non-responsive to Interrupt

2023-04-28 Thread Dario Strbenac via R-devel
Hello,

Could save.image() be redesigned so that it promptly responds to Ctrl+C? It 
prevents the command line from being used for a number of hours if the contents 
of the workspace are large.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Subset has No Examples for Vector Data

2023-10-10 Thread Dario Strbenac via R-devel
Hello,

Could the documentation page for subset gain an example of how to use it for 
something other than a data frame or matrix? I arrived at

> random <- LETTERS[rpois(100, 10)]
> subset(table(random), x > 10)
named integer(0)

I expected a part of the table to be returned rather than an empty vector.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] confint Attempts to Use All Server CPUs by Default

2024-05-21 Thread Dario Strbenac via R-devel
Hello,

Would a less resource-intensive value, such as 1, be a safer default CPU value 
for confint? I noticed excessive CPU usage on a I.T. administrator-managed 
server which was being three-quarters used by another staff member when the 
confidence interval calculation in an R Markdown document suddenly changed from 
two seconds to ninety seconds because of competition for CPUs between users. 
Also, there is no mention of such parallel processing in ?confint, so it was 
not clear at first where to look for performance degradation. It could at least 
be described in the manual page so that users would know that export 
OPENBLAS_NUM_THREADS=1 is a solution.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] confint Attempts to Use All Server CPUs by Default

2024-05-22 Thread Dario Strbenac via R-devel
Hello,

It is from the stats package and applied to the output of logistic regression 
as implemented by glm with setting family = "binomial". So, it is a base 
package and not an add-on CRAN package. I shall recommend linking to FlexiBLAS 
to our system administrator.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Changes in the survival package (long)

2024-12-15 Thread Dario Strbenac via R-devel
Good day,

It is impressive to see such sustained package maintenance.

--
Dr. Dario Strbenac
Bioinformatics Research Associate
School of Mathematics and Statistics, University of Sydney
Camperdown N.S.W. 2050, Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] stats glm Response Format Ambiguity

2024-12-17 Thread Dario Strbenac via R-devel
Hello,

Could there be clarification added to glm's documentation? In contrast, glmnet 
leaves no ambiguity about what it expects for response.

glm:  y: is a vector of observations of length n
glmnet: y: For family="binomial" should be either a factor with two levels, or 
a two-column matrix of counts or proportions (the second column is treated as 
the target class). For family="multinomial", can be a nc>=2 level factor, or a 
matrix with nc columns of counts or proportions. For either "binomial" or 
"multinomial", if y is presented as a vector, it will be coerced into a factor.

"a vector of observations" doesn't really narrow it down much. The warning 
emitted when y a is vector of proportions isn't particularly informative, 
either.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel