Le 19/04/2018 à 09:30, Tomas Kalibera a écrit :
On 04/19/2018 02:06 AM, Duncan Murdoch wrote:
On 18/04/2018 5:08 PM, Tousey, Colton wrote:
Hello,

I want to report a bug in R that is limiting my capabilities to export a matrix with write.csv or write.table with over 2,147,483,648 elements (C's int limit). I found this bug already reported about before: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. However, there appears to be no solution or fixes in upcoming R version releases.

The error message is coming from the writetable part of the utils package in the io.c source code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c):
/* quick integrity check */
                 if(XLENGTH(x) != (R_len_t)nr * nc)
                     error(_("corrupt matrix -- dims not not match length"));

The issue is that nr*nc is an integer and the size of my matrix, 2.8 billion elements, exceeds C's limit, so the check forces the code to fail.

Yes, looks like a typo:  R_len_t is an int, and that's how nr was declared.  It should be R_xlen_t, which is bigger on machines that support big vectors.

I haven't tested the change; there may be something else in that function that assumes short vectors.
Indeed, I think the function won't work for long vectors because of EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be changed, including their signatures

That would be a definite fix but before such deep rewriting is undertaken may the following small fix (in addition to "(R_xlen_t)nr * nc") will be sufficient for cases where nr and nc are in int range but their product can reach long vector limit:

replace
    tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
                    &strBuf, sdec);
by
    tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, quote_col[j], qmethod,
                    &strBuf, sdec);

Serguei

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to