Re: [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

Tomas Kalibera Thu, 19 Apr 2018 03:16:29 -0700

On 04/19/2018 11:47 AM, Serguei Sokol wrote:

Le 19/04/2018 à 09:30, Tomas Kalibera a écrit :
On 04/19/2018 02:06 AM, Duncan Murdoch wrote:
On 18/04/2018 5:08 PM, Tousey, Colton wrote:
Hello,
I want to report a bug in R that is limiting my capabilities toexport a matrix with write.csv or write.table with over2,147,483,648 elements (C's int limit). I found this bug alreadyreported about before:https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. However,there appears to be no solution or fixes in upcoming R versionreleases.
The error message is coming from the writetable part of the utilspackage in the io.c sourcecode(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c):
/* quick integrity check */
                 if(XLENGTH(x) != (R_len_t)nr * nc)
error(_("corrupt matrix -- dims not not matchlength"));
The issue is that nr*nc is an integer and the size of my matrix,2.8 billion elements, exceeds C's limit, so the check forces thecode to fail.
Yes, looks like a typo: R_len_t is an int, and that's how nr wasdeclared. It should be R_xlen_t, which is bigger on machines thatsupport big vectors.
I haven't tested the change; there may be something else in thatfunction that assumes short vectors.
Indeed, I think the function won't work for long vectors because ofEncodeElement2 and EncodeElement0. EncodeElement2/0 would have to bechanged, including their signatures
That would be a definite fix but before such deep rewriting isundertaken may the following small fix (in addition to "(R_xlen_t)nr *nc") will be sufficient for cases where nr and nc are in int range buttheir product can reach long vector limit:
replace
    tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
                    &strBuf, sdec);
by
tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0,quote_col[j], qmethod,
                    &strBuf, sdec);

Unfortunately we can't do that, x is a matrix of an atomic vector type.VECTOR_ELT is taking elements of a generic vector, so it cannot beapplied to "x". But even if we extracted a single element from "x" (e.g.via a type-switch etc), we would not be able to pass it toEncodeElement0 which expects a full atomic vector (that is, includingits header). Instead we would have to call functions like EncodeInteger,EncodeReal0, etc on the individual elements. Which is then the same aschanging EncodeElement0 or implementing a new version of it. This doesnot seem that hard to fix, just is not as trivial as changing the cast..


Tomas


Serguei


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

Reply via email to