Re: [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

Tomas Kalibera Tue, 22 May 2018 06:13:40 -0700

Fixed in R-devel 74754.
Tomas

On 04/19/2018 12:15 PM, Tomas Kalibera wrote:

On 04/19/2018 11:47 AM, Serguei Sokol wrote:
Le 19/04/2018 à 09:30, Tomas Kalibera a écrit :
On 04/19/2018 02:06 AM, Duncan Murdoch wrote:
On 18/04/2018 5:08 PM, Tousey, Colton wrote:
Hello,
I want to report a bug in R that is limiting my capabilities toexport a matrix with write.csv or write.table with over2,147,483,648 elements (C's int limit). I found this bug alreadyreported about before:https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182.However, there appears to be no solution or fixes in upcoming Rversion releases.
The error message is coming from the writetable part of the utilspackage in the io.c sourcecode(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c):
/* quick integrity check */
                 if(XLENGTH(x) != (R_len_t)nr * nc)
error(_("corrupt matrix -- dims not not matchlength"));
The issue is that nr*nc is an integer and the size of my matrix,2.8 billion elements, exceeds C's limit, so the check forces thecode to fail.
Yes, looks like a typo: R_len_t is an int, and that's how nr wasdeclared. It should be R_xlen_t, which is bigger on machines thatsupport big vectors.
I haven't tested the change; there may be something else in thatfunction that assumes short vectors.
Indeed, I think the function won't work for long vectors because ofEncodeElement2 and EncodeElement0. EncodeElement2/0 would have to bechanged, including their signatures
That would be a definite fix but before such deep rewriting isundertaken may the following small fix (in addition to "(R_xlen_t)nr* nc") will be sufficient for cases where nr and nc are in int rangebut their product can reach long vector limit:
replace
    tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
                    &strBuf, sdec);
by
tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0,quote_col[j], qmethod,
                    &strBuf, sdec);
Unfortunately we can't do that, x is a matrix of an atomic vectortype. VECTOR_ELT is taking elements of a generic vector, so it cannotbe applied to "x". But even if we extracted a single element from "x"(e.g. via a type-switch etc), we would not be able to pass it toEncodeElement0 which expects a full atomic vector (that is, includingits header). Instead we would have to call functions likeEncodeInteger, EncodeReal0, etc on the individual elements. Which isthen the same as changing EncodeElement0 or implementing a new versionof it. This does not seem that hard to fix, just is not as trivial aschanging the cast..
Tomas
Serguei


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

Reply via email to