Re: [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
On 04/19/2018 02:06 AM, Duncan Murdoch wrote: On 18/04/2018 5:08 PM, Tousey, Colton wrote: Hello, I want to report a bug in R that is limiting my capabilities to export a matrix with write.csv or write.table with over 2,147,483,648 elements (C's int limit). I found this bug already reported about before: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. However, there appears to be no solution or fixes in upcoming R version releases. The error message is coming from the writetable part of the utils package in the io.c source code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c): /* quick integrity check */ if(XLENGTH(x) != (R_len_t)nr * nc) error(_("corrupt matrix -- dims not not match length")); The issue is that nr*nc is an integer and the size of my matrix, 2.8 billion elements, exceeds C's limit, so the check forces the code to fail. Yes, looks like a typo: R_len_t is an int, and that's how nr was declared. It should be R_xlen_t, which is bigger on machines that support big vectors. I haven't tested the change; there may be something else in that function that assumes short vectors. Indeed, I think the function won't work for long vectors because of EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be changed, including their signatures Tomas Duncan Murdoch My version: R.Version() $platform [1] "x86_64-w64-mingw32" $arch [1] "x86_64" $os [1] "mingw32" $system [1] "x86_64, mingw32" $status [1] "" $major [1] "3" $minor [1] "4.3" $year [1] "2017" $month [1] "11" $day [1] "30" $`svn rev` [1] "73796" $language [1] "R" $version.string [1] "R version 3.4.3 (2017-11-30)" $nickname [1] "Kite-Eating Tree" Thank you, Colton Colton Tousey Research Associate II P: 816.585.0300 E: colton.tou...@kc.frb.org FEDERAL RESERVE BANK OF KANSAS CITY 1 Memorial Drive * Kansas City, Missouri 64198 * www.kansascityfed.org [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] odd assignInNamespace / setGeneric interaction
> Michael Lawrence > on Wed, 18 Apr 2018 14:16:37 -0700 writes: > Hi Bill, > Ideally, your coworker would just make an alias (or shortcut or > whatever) for R that passed --no-save to R. I'll try to look into this > though. > Michael Yes, indeed! As some of you know, I've been using R (for ca 23 years now) almost only from ESS (Emacs Speaks Statistics). There, I've activated '--no-save' for ca 20 years or so, nowadays (since Emacs has adopted "custom") I have had this in my ~/.emacs custom lines '(inferior-R-args "--no-restore-history --no-save ") standalone (to paste into your own ~/.emacs ) : (custom-set-variables '(inferior-R-args "--no-restore-history --no-save ")) The current fashionable IDE to R, Rstudio, also allows to set such switches by its GUI: Menu [Tools] --> (bottom) entry [Global Options] --> the first sidebar entry [R General]: Look for two lines mentioning "workspace" or ".RData" and change to 'save never' ( == --no-save), and nowadays I also recommend my students to not *read* these, i.e., '--no-restore' --- @Michael: I'm not sure what you're considering. I feel that in general, there are already too many R startup tweaking possibilities, notably via environment variables. [e.g., the current ways to pre-determine the active .libPaths() in R, and the fact the R calls R again during 'R CMD check' etc, sometimes drives me crazy when .libPaths() become incompatible for too many reasons yes, I'm diverting: that's another story] If we'd want to allow using (yet another!) environment variable here, I'd at least would make sure they are not consulted when explicit --no-save or --vanilla, etc are used. Martin > On Wed, Apr 18, 2018 at 1:38 PM, William Dunlap via R-devel > wrote: >> A coworker got tired of having to type 'yes' or 'no' after quitting R: he >> never wanted to save the R workspace when quitting. So he added >> assignInNamespace lines to his .Rprofile file to replace base::q with >> one that, by default, called the original with save="no".. >> >> utils::assignInNamespace(".qOrig", base::q, "base") >> utils::assignInNamespace("q", function(save = "no", ...) >> base:::.qOrig(save = save, ...), "base") >> >> This worked fine until he decide to load the distr package: >> >> > suppressPackageStartupMessages(library(distr)) >> Error: package or namespace load failed for ‘distr’ in >> loadNamespace(name): >> there is no package called ‘.GlobalEnv’ >> >> distr calls setGeneric("q"), which indirectly causes the environment >> of base::q, .GlobalEnv, to be loaded as a namespace, causing the error. >> Giving his replacement q function the environment getNamespace("base") >> avoids the problem. >> >> I can reproduce the problem by making a package that just calls >> setGeneric("as.hexmode",...) and a NAMEPACE file with >> exportMethods("as.hexmode"). If my .Rprofile puts a version of as.hexmode >> with environment .GlobalEnv into the base namespace, then I get the same >> error when trying to load the package. >> >> I suppose this is mostly a curiosity and unlikely to happen to most people >> but it did confuse us for a while. >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> [[alternative HTML version deleted]] >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
Le 19/04/2018 à 09:30, Tomas Kalibera a écrit : On 04/19/2018 02:06 AM, Duncan Murdoch wrote: On 18/04/2018 5:08 PM, Tousey, Colton wrote: Hello, I want to report a bug in R that is limiting my capabilities to export a matrix with write.csv or write.table with over 2,147,483,648 elements (C's int limit). I found this bug already reported about before: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. However, there appears to be no solution or fixes in upcoming R version releases. The error message is coming from the writetable part of the utils package in the io.c source code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c): /* quick integrity check */ if(XLENGTH(x) != (R_len_t)nr * nc) error(_("corrupt matrix -- dims not not match length")); The issue is that nr*nc is an integer and the size of my matrix, 2.8 billion elements, exceeds C's limit, so the check forces the code to fail. Yes, looks like a typo: R_len_t is an int, and that's how nr was declared. It should be R_xlen_t, which is bigger on machines that support big vectors. I haven't tested the change; there may be something else in that function that assumes short vectors. Indeed, I think the function won't work for long vectors because of EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be changed, including their signatures That would be a definite fix but before such deep rewriting is undertaken may the following small fix (in addition to "(R_xlen_t)nr * nc") will be sufficient for cases where nr and nc are in int range but their product can reach long vector limit: replace tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod, &strBuf, sdec); by tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, quote_col[j], qmethod, &strBuf, sdec); Serguei __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
On 04/19/2018 11:47 AM, Serguei Sokol wrote: Le 19/04/2018 à 09:30, Tomas Kalibera a écrit : On 04/19/2018 02:06 AM, Duncan Murdoch wrote: On 18/04/2018 5:08 PM, Tousey, Colton wrote: Hello, I want to report a bug in R that is limiting my capabilities to export a matrix with write.csv or write.table with over 2,147,483,648 elements (C's int limit). I found this bug already reported about before: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. However, there appears to be no solution or fixes in upcoming R version releases. The error message is coming from the writetable part of the utils package in the io.c source code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c): /* quick integrity check */ if(XLENGTH(x) != (R_len_t)nr * nc) error(_("corrupt matrix -- dims not not match length")); The issue is that nr*nc is an integer and the size of my matrix, 2.8 billion elements, exceeds C's limit, so the check forces the code to fail. Yes, looks like a typo: R_len_t is an int, and that's how nr was declared. It should be R_xlen_t, which is bigger on machines that support big vectors. I haven't tested the change; there may be something else in that function that assumes short vectors. Indeed, I think the function won't work for long vectors because of EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be changed, including their signatures That would be a definite fix but before such deep rewriting is undertaken may the following small fix (in addition to "(R_xlen_t)nr * nc") will be sufficient for cases where nr and nc are in int range but their product can reach long vector limit: replace tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod, &strBuf, sdec); by tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, quote_col[j], qmethod, &strBuf, sdec); Unfortunately we can't do that, x is a matrix of an atomic vector type. VECTOR_ELT is taking elements of a generic vector, so it cannot be applied to "x". But even if we extracted a single element from "x" (e.g. via a type-switch etc), we would not be able to pass it to EncodeElement0 which expects a full atomic vector (that is, including its header). Instead we would have to call functions like EncodeInteger, EncodeReal0, etc on the individual elements. Which is then the same as changing EncodeElement0 or implementing a new version of it. This does not seem that hard to fix, just is not as trivial as changing the cast.. Tomas Serguei __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
Le 19/04/2018 à 12:15, Tomas Kalibera a écrit : On 04/19/2018 11:47 AM, Serguei Sokol wrote: replace tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod, &strBuf, sdec); by tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, quote_col[j], qmethod, &strBuf, sdec); Unfortunately we can't do that, x is a matrix of an atomic vector type. VECTOR_ELT is taking elements of a generic vector, so it cannot be applied to "x". But even if we extracted a single element from "x" (e.g. via a type-switch etc), we would not be able to pass it to EncodeElement0 which expects a full atomic vector (that is, including its header). Instead we would have to call functions like EncodeInteger, EncodeReal0, etc on the individual elements. Which is then the same as changing EncodeElement0 or implementing a new version of it. This does not seem that hard to fix, just is not as trivial as changing the cast.. Thanks Tomas for this detailed explanation. I would like also to signal a problem with the list. It must be corrupted in some way because beside the Tomas' response I've got five or six (so far) dating spam. All of them coming from two emails: Kristina Oliynik and Samantha Smith . Serguei. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Spam to R-* list posters
> Serguei Sokol > on Thu, 19 Apr 2018 13:29:54 +0200 writes: [...] > Thanks Tomas for this detailed explanation. > I would like also to signal a problem with the list. It must be > corrupted in some way because beside the Tomas' response I've got five > or six (so far) dating spam. All of them coming from two emails: > Kristina Oliynik and > Samantha Smith . Well, that's the current ones for you. They change over time, and in my experience you get about 10--20 (about once per hour; on purpose not exactly every 60 minutes) and then it stops. I've replied to the thread "Hacked" on R-help yesterday: https://stat.ethz.ch/pipermail/r-help/2018-April/452423.html This has started ca 2 weeks ago on R-help already, and today we've learned that even R-SIG-Mixed-Models is affected. I think I don't see them anymore at all because my spam filters have adapted. Note that 1. This is *NOT* from regular mailing list subscribers, and none of these spam come via the R mailing list servers. 2. It's still a huge pain and disreputable to the R lists of course. 3. I had hoped we could wait and see it go away, but I may be wrong. 4. We have re-started discussing what could be done. One drastic measure would make mailing list usage *less* attractive by "munging" all poster's e-mail addresses. - For now use your mail providers spam filters to quickly get rid of this. .. or more interestingly and clearly less legally: use R to write "mail bombs". Write an R function sending ca 10 e-mails per hour randomly to that address ... ;-) I did something like that (with a shell script, not R) at the end of last millennium when I was younger and the internet was a much much smaller space than now... Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] odd assignInNamespace / setGeneric interaction
The problem is not specific to redefining the q function, but to the interaction of assignInNamespace and setGeneric. The latter requires, roughtly, that the environment of the function being replaced by an S4 generic is (or is the descendent of) the environment in which it lives. E.g., the following demonstrates the problem % R --quiet --vanilla > assignInNamespace("plot", function(x, ...) stop("No plotting allowed!"), getNamespace("graphics")) > library(stats4) Error: package or namespace load failed for ‘stats4’ in loadNamespace(name): there is no package called ‘.GlobalEnv’ and defining the bogus plot function in the graphics namespace avoids the problem % R --quiet --vanilla > assignInNamespace("plot", with(getNamespace("graphics"), function(x, ...) stop("No plotting allowed!")), getNamespace("graphics")) > library(stats4) > I suppose poeple who use assignInNamespace get what they deserve. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Apr 19, 2018 at 2:33 AM, Martin Maechler wrote: > > Michael Lawrence > > on Wed, 18 Apr 2018 14:16:37 -0700 writes: > > > Hi Bill, > > Ideally, your coworker would just make an alias (or shortcut or > > whatever) for R that passed --no-save to R. I'll try to look into > this > > though. > > > Michael > > Yes, indeed! > > As some of you know, I've been using R (for ca 23 years now) > almost only from ESS (Emacs Speaks Statistics). > > There, I've activated '--no-save' for ca 20 years or so, > nowadays (since Emacs has adopted "custom") I have had this in > my ~/.emacs custom lines > > '(inferior-R-args "--no-restore-history --no-save ") > > standalone (to paste into your own ~/.emacs ) : > > (custom-set-variables '(inferior-R-args "--no-restore-history --no-save ")) > > > > The current fashionable IDE to R, > Rstudio, also allows to set such switches by its GUI: > > Menu [Tools] > --> (bottom) entry [Global Options] > --> the first sidebar entry [R General]: > Look for two lines mentioning "workspace" or ".RData" and > change to 'save never' ( == --no-save), > and nowadays I also recommend my students to not *read* > these, i.e., '--no-restore' > > --- > > @Michael: I'm not sure what you're considering. I feel that in > general, there are already too many R startup tweaking > possibilities, notably via environment variables. > [e.g., the current ways to pre-determine the active .libPaths() in R, > and the fact the R calls R again during 'R CMD check' etc, > sometimes drives me crazy when .libPaths() become incompatible > for too many reasons yes, I'm diverting: that's another story] > > If we'd want to allow using (yet another!) environment variable > here, I'd at least would make sure they are not consulted when > explicit --no-save or --vanilla, etc are used. > > Martin > > > > On Wed, Apr 18, 2018 at 1:38 PM, William Dunlap via R-devel > > wrote: > >> A coworker got tired of having to type 'yes' or 'no' after quitting > R: he > >> never wanted to save the R workspace when quitting. So he added > >> assignInNamespace lines to his .Rprofile file to replace base::q > with > >> one that, by default, called the original with save="no".. > >> > >> utils::assignInNamespace(".qOrig", base::q, "base") > >> utils::assignInNamespace("q", function(save = "no", ...) > >> base:::.qOrig(save = save, ...), "base") > >> > >> This worked fine until he decide to load the distr package: > >> > >> > suppressPackageStartupMessages(library(distr)) > >> Error: package or namespace load failed for ‘distr’ in > >> loadNamespace(name): > >> there is no package called ‘.GlobalEnv’ > >> > >> distr calls setGeneric("q"), which indirectly causes the environment > >> of base::q, .GlobalEnv, to be loaded as a namespace, causing the > error. > >> Giving his replacement q function the environment > getNamespace("base") > >> avoids the problem. > >> > >> I can reproduce the problem by making a package that just calls > >> setGeneric("as.hexmode",...) and a NAMEPACE file with > >> exportMethods("as.hexmode"). If my .Rprofile puts a version of > as.hexmode > >> with environment .GlobalEnv into the base namespace, then I get the > same > >> error when trying to load the package. > >> > >> I suppose this is mostly a curiosity and unlikely to happen to most > people > >> but it did confuse us for a while. > >> > >> Bill Dunlap > >> TIBCO Software > >> wdunlap tibco.com > >> > >> [[alternative HTML version deleted]] > >> > >> __ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HT
Re: [Rd] Spam to R-* list posters
On 2018-04-19 09:40, Martin Maechler wrote: Serguei Sokol on Thu, 19 Apr 2018 13:29:54 +0200 writes: [...] > Thanks Tomas for this detailed explanation. > I would like also to signal a problem with the list. It must be > corrupted in some way because beside the Tomas' response I've got five > or six (so far) dating spam. All of them coming from two emails: > Kristina Oliynik and > Samantha Smith . Well, that's the current ones for you. They change over time, and in my experience you get about 10--20 (about once per hour; on purpose not exactly every 60 minutes) and then it stops. I've replied to the thread "Hacked" on R-help yesterday: https://stat.ethz.ch/pipermail/r-help/2018-April/452423.html This has started ca 2 weeks ago on R-help already, and today we've learned that even R-SIG-Mixed-Models is affected. I think I don't see them anymore at all because my spam filters have adapted. Note that 1. This is *NOT* from regular mailing list subscribers, and none of these spam come via the R mailing list servers. 2. It's still a huge pain and disreputable to the R lists of course. 3. I had hoped we could wait and see it go away, but I may be wrong. 4. We have re-started discussing what could be done. One drastic measure would make mailing list usage *less* attractive by "munging" all poster's e-mail addresses. - For now use your mail providers spam filters to quickly get rid of this. .. or more interestingly and clearly less legally: use R to write "mail bombs". Write an R function sending ca 10 e-mails per hour randomly to that address ... ;-) I did something like that (with a shell script, not R) at the end of last millennium when I was younger and the internet was a much much smaller space than now... What about implementing "Mailhide", described in the Wikipedia article on "reCAPTCHA"? '[F]or example, "mai...@example.com" would be converted to "mai...@example.com". The visitor would then click on the "..." and solve the CAPTCHA in order to obtain the full email address. One can also edit the pop-up code so that none of the address is visible.' (https://en.wikipedia.org/wiki/ReCAPTCHA) Of course, this is easier for me to suggest, because I'm not in a position to actually implement it ;-) Spencer Graves p.s. I wish again to express my deep appreciation to Martin and the other members of the R Core team who have invested so much time and creativity into making The R Project for Statistical Computing the incredible service it is today. A good portion of humanity lives better today, because of problems that would not otherwise have been addressed as well as they have been without some important analysis done with R. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] odd assignInNamespace / setGeneric interaction
To clarify, I am going to fix the issue in the methods package (actually I already have but need to test further). There's no intent to change the behavior of q(). On Thu, Apr 19, 2018 at 8:39 AM, William Dunlap wrote: > The problem is not specific to redefining the q function, but to > the interaction of assignInNamespace and setGeneric. The > latter requires, roughtly, that the environment of the function > being replaced by an S4 generic is (or is the descendent of) > the environment in which it lives. > > E.g., the following demonstrates the problem > > % R --quiet --vanilla >> assignInNamespace("plot", function(x, ...) stop("No plotting allowed!"), >> getNamespace("graphics")) >> library(stats4) > Error: package or namespace load failed for ‘stats4’ in loadNamespace(name): > there is no package called ‘.GlobalEnv’ > > and defining the bogus plot function in the graphics namespace avoids the > problem > > % R --quiet --vanilla >> assignInNamespace("plot", with(getNamespace("graphics"), function(x, ...) >> stop("No plotting allowed!")), getNamespace("graphics")) >> library(stats4) >> > > I suppose poeple who use assignInNamespace get what they deserve. > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Thu, Apr 19, 2018 at 2:33 AM, Martin Maechler > wrote: >> >> > Michael Lawrence >> > on Wed, 18 Apr 2018 14:16:37 -0700 writes: >> >> > Hi Bill, >> > Ideally, your coworker would just make an alias (or shortcut or >> > whatever) for R that passed --no-save to R. I'll try to look into >> this >> > though. >> >> > Michael >> >> Yes, indeed! >> >> As some of you know, I've been using R (for ca 23 years now) >> almost only from ESS (Emacs Speaks Statistics). >> >> There, I've activated '--no-save' for ca 20 years or so, >> nowadays (since Emacs has adopted "custom") I have had this in >> my ~/.emacs custom lines >> >> '(inferior-R-args "--no-restore-history --no-save ") >> >> standalone (to paste into your own ~/.emacs ) : >> >> (custom-set-variables '(inferior-R-args "--no-restore-history --no-save >> ")) >> >> >> >> The current fashionable IDE to R, >> Rstudio, also allows to set such switches by its GUI: >> >> Menu [Tools] >> --> (bottom) entry [Global Options] >> --> the first sidebar entry [R General]: >> Look for two lines mentioning "workspace" or ".RData" and >> change to 'save never' ( == --no-save), >> and nowadays I also recommend my students to not *read* >> these, i.e., '--no-restore' >> >> --- >> >> @Michael: I'm not sure what you're considering. I feel that in >> general, there are already too many R startup tweaking >> possibilities, notably via environment variables. >> [e.g., the current ways to pre-determine the active .libPaths() in R, >> and the fact the R calls R again during 'R CMD check' etc, >> sometimes drives me crazy when .libPaths() become incompatible >> for too many reasons yes, I'm diverting: that's another story] >> >> If we'd want to allow using (yet another!) environment variable >> here, I'd at least would make sure they are not consulted when >> explicit --no-save or --vanilla, etc are used. >> >> Martin >> >> >> > On Wed, Apr 18, 2018 at 1:38 PM, William Dunlap via R-devel >> > wrote: >> >> A coworker got tired of having to type 'yes' or 'no' after quitting >> R: he >> >> never wanted to save the R workspace when quitting. So he added >> >> assignInNamespace lines to his .Rprofile file to replace base::q >> with >> >> one that, by default, called the original with save="no".. >> >> >> >> utils::assignInNamespace(".qOrig", base::q, "base") >> >> utils::assignInNamespace("q", function(save = "no", ...) >> >> base:::.qOrig(save = save, ...), "base") >> >> >> >> This worked fine until he decide to load the distr package: >> >> >> >> > suppressPackageStartupMessages(library(distr)) >> >> Error: package or namespace load failed for ‘distr’ in >> >> loadNamespace(name): >> >> there is no package called ‘.GlobalEnv’ >> >> >> >> distr calls setGeneric("q"), which indirectly causes the >> environment >> >> of base::q, .GlobalEnv, to be loaded as a namespace, causing the >> error. >> >> Giving his replacement q function the environment >> getNamespace("base") >> >> avoids the problem. >> >> >> >> I can reproduce the problem by making a package that just calls >> >> setGeneric("as.hexmode",...) and a NAMEPACE file with >> >> exportMethods("as.hexmode"). If my .Rprofile puts a version of >> as.hexmode >> >> with environment .GlobalEnv into the base namespace, then I get the >> same >> >> error when trying to load the package. >> >> >> >> I suppose this is mostly a curiosity and unlikely to happen to most >> people >> >> but it did confuse us for a while. >> >> >> >> Bill Dunlap >> >> TIBCO Software >> >> wdunlap tibco.com >> >> >> >> [[alter