Re: [Rd] Objectsize function visiting every element for alt-rep strings
> Travers Ching > on Tue, 15 Jan 2019 12:50:45 -0800 writes: > I have a toy alt-rep string package that generates > randomly seeded strings. example: library(altstringisode) > x <- altrandomStrings(1e8) head(x) [1] > "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" > "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... etc object.size(1e8) > Object.size will call the set_altstring_Elt_method for > every single element, materializing (slowly) every element > of the vector. This is a problem mostly in R-studio since > object.size is called automatically, defeating the purpose > of alt-rep. Hmm. But still, the idea had been that object.size() *shuld* return the size of the "de-ALTREP'ed" object *but* should not de-ALTREP it. That's what happens for integers, but indeed fails to happen for such as.character(.)ed integers. >From my eRum presentation (which took from the official ALTREP documentation https://svn.r-project.org/R/branches/ALTREP/ALTREP.html ) : > x <- 1:1e15 > object.size(x) # 8000'000'000'000'048 bytes : 8000 TBytes -- ok, not really 8048 bytes > is.unsorted(x) # FALSE : i.e., R's *knows* it is sorted [1] FALSE > xs <- sort(x) # > .Internal(inspect(x)) @80255f8 14 REALSXP g0c0 [NAM(7)] 1 : 1000 (compact) > > cx <- as.character(x) > .Internal(inspect(cx)) @80485d8 16 STRSXP g0c0 [NAM(1)] @80255f8 14 REALSXP g1c0 [MARK,NAM(7)] 1 : 1000 (compact) > system.time( print(object.size(x)), gc=FALSE) 8048 bytes user system elapsed 0.000 0.000 0.001 > system.time( print(object.size(cx)), gc=FALSE) Error: cannot allocate vector of size 8388608.0 Gb Timing stopped at: 11.43 0 11.46 > One could consider it a bug that object.size(cx) is indeed inspecting every string, i.e., accessing cx[i] for all i. Note that it is *not* deALTREPing cx itself : > x <- 1:1e6 > cx <- as.character(x) > .Internal(inspect(cx)) @7f5b1a0 16 STRSXP g0c0 [NAM(1)] @7f5adb0 13 INTSXP g0c0 [NAM(7)] 1 : 100 (compact) > system.time( print(object.size(cx)), gc=FALSE) 6448 bytes user system elapsed 0.369 0.005 0.374 > .Internal(inspect(cx)) @7f5b1a0 16 STRSXP g0c0 [NAM(7)] @7f5adb0 13 INTSXP g0c0 [NAM(7)] 1 : 100 (compact) > > Is there a way to avoid the problem of forced > materialization in rstudio? > PS: Is there a way to tell if a post has been received by > the mailing list? How long does it take to show up in the > archives? [ that (waiting time) distribution is quite right skewed... I'd guess it's median to be less than 10 minutes... but we had artificially delayed it somewhat in the past to fight spammers, and ETH (the hosting instituttion) and others have increased spam and virus filtering so everything has become quite a bit slower ] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] long-standing documentation bug in ?anova.lme
> Ben Bolker > on Thu, 17 Jan 2019 12:32:20 -0500 writes: > tl;dr anova.lme() claims to provide sums of squares, but it doesn't. And > some names are misspelled in ?lme. I can submit all this stuff as a bug > report if that's preferred. > ?anova.lme says: > When only one fitted model object is present, a data frame with > the sums of squares, numerator degrees of freedom, denominator > degrees of freedom, F-values, and P-values > The output of > fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age > anova(fm1) > gives columns > numDF denDF F-value p-value > -- i.e. the sums of squares aren't there! (For fairly good reasons; lme > doesn't actually compute them internally, and it might not always be > straightforward to compute them, for more complex models. They would > mostly be useful for comparison with simpler, method-of-moments based > approaches like aov()). Federico Calboli pointed this out on r-help in > 2004: https://stat.ethz.ch/pipermail/r-help/2004-May/051444.html > Two more points: > * the last sentence of the Description might need one fewer comma > [after "statistic"] or one more [after "p-value"]. > * in ?lme, Littell's name is misspelled at least twice and Reinsel's > at least once. We'd be grateful for patches, thank you Ben! Notably for 'nlme' and 'foreign', both of which are maintained by R-core (rather than individual R core or R Foundation members) we've also encouraged that R's bugzilla be used for non-trivial bug reports as that allows attached patches and simple references too. > Is there a publicly accessible SVN server for recommended packages (in > general) and nlme (in particular) anywhere? nlme's SVN is physically at the same place as the R sources (here at ETH Zurich), with URL https://svn.r-project.org/R-packages/trunk/nlme in addition to 'nlme', at least 'foreign', 'mgcv' and 'cluster' are also maintained there. Thank you for the question: I do think "we" should add the corresponding svn URL to the respective DESCRIPTION file. OTOH, 'Matrix' has moved to R-forge a while ago .. and I'm currently also not sure about the other Recommended packages such as 'KernSmooth' or 'boot' . Best, Martin Martin Maechler ETH Zurich and R core team __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] orderVector1 (sort.c): Tiny improvement concerning nalast
Dear Sir, In the functions orderVector1, orderVector1l (R-3.5.2/src/main/sort.c) there are two loops concerning nalast (lines 1096, 1105). I am not sure about the possibility of redefining them, so this function should be a little faster. The first one (line 1096) can be included in the previous 'switch' block (line 1079) (see below). And if you rewrite/duplicate this 'switch' block (line 1079) for the case nalast==false, you should be able to avoid the loop of line 1105. Best regards, Emilio *** /home/emilio/Descargas/R-3.5.2/src/main/sort.c 2018-11-07 00:15:02.0 +0100 --- /home/emilio/Descargas/R-3.5.2/src/main/sort2.c 2019-01-21 11:13:07.414332755 +0100 *** *** 1079,1099 switch (TYPEOF(key)) { case LGLSXP: case INTSXP: ! for (i = 0; i < n; i++) isna[i] = (ix[i] == NA_INTEGER); ! break; case REALSXP: ! for (i = 0; i < n; i++) isna[i] = ISNAN(x[i]); ! break; case STRSXP: ! for (i = 0; i < n; i++) isna[i] = (sx[i] == NA_STRING); ! break; case CPLXSXP: ! for (i = 0; i < n; i++) isna[i] = ISNAN(cx[i].r) || ISNAN(cx[i].i); ! break; default: ! UNIMPLEMENTED_TYPE("orderVector1", key); } ! for (i = 0; i < n; i++) numna += isna[i]; if(numna) switch (TYPEOF(key)) { --- 1079, switch (TYPEOF(key)) { case LGLSXP: case INTSXP: ! for (i = 0; i < n; i++) { ! isna[i] = (ix[i] == NA_INTEGER); ! numna += isna[i]; ! } ! break; case REALSXP: ! for (i = 0; i < n; i++){ ! isna[i] = ISNAN(x[i]); ! numna += isna[i]; ! } ! break; case STRSXP: ! for (i = 0; i < n; i++){ ! isna[i] = (sx[i] == NA_STRING); ! numna += isna[i]; ! } ! break; case CPLXSXP: ! for (i = 0; i < n; i++){ ! isna[i] = ISNAN(cx[i].r) || ISNAN(cx[i].i); ! numna += isna[i]; ! } ! break; default: ! UNIMPLEMENTED_TYPE("orderVector1", key); } ! /* for (i = 0; i < n; i++) numna += isna[i]; */ if(numna) switch (TYPEOF(key)) { -- = Emilio Torres Manzanera Fac. de Comercio - Universidad de Oviedo c/ Luis Moya 261, E-33203 Gijón (Spain) Tel. 985 182 197 email: tor...@uniovi.es = __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] pmax and long vector
I see that base::pmax() does not support long vectors. Is R-devel interested in reports like this; ie. is there a goal of full support for long vectors in "basic" functions, something I at least would greatly appreciate? MRE: > pmax(rep(1L, 3*10^9), 0) Error in pmax(rep(1L, 3 * 10^9), 0) : long vectors not supported yet: ../../../R-devel-src/src/include/Rinlinedfuns.h:522 Best, Kasper [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] long-standing documentation bug in ?anova.lme
Here are relevant patches to address the various issues described below. Thanks for the SVN info! cheers Ben Bolker On 2019-01-21 4:54 a.m., Martin Maechler wrote: >> Ben Bolker >> on Thu, 17 Jan 2019 12:32:20 -0500 writes: > > > tl;dr anova.lme() claims to provide sums of squares, but it doesn't. And > > some names are misspelled in ?lme. I can submit all this stuff as a bug > > report if that's preferred. > > > ?anova.lme says: > > > When only one fitted model object is present, a data frame with > > the sums of squares, numerator degrees of freedom, denominator > > degrees of freedom, F-values, and P-values > > > The output of > > > fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age > > anova(fm1) > > > gives columns > > > numDF denDF F-value p-value > > > -- i.e. the sums of squares aren't there! (For fairly good reasons; lme > > doesn't actually compute them internally, and it might not always be > > straightforward to compute them, for more complex models. They would > > mostly be useful for comparison with simpler, method-of-moments based > > approaches like aov()). Federico Calboli pointed this out on r-help in > > 2004: https://stat.ethz.ch/pipermail/r-help/2004-May/051444.html > > > > Two more points: > > > * the last sentence of the Description might need one fewer comma > > [after "statistic"] or one more [after "p-value"]. > > * in ?lme, Littell's name is misspelled at least twice and Reinsel's > > at least once. > > We'd be grateful for patches, thank you Ben! > > Notably for 'nlme' and 'foreign', both of which are maintained > by R-core (rather than individual R core or R Foundation > members) we've also encouraged that R's bugzilla be used for > non-trivial bug reports as that allows attached patches and > simple references too. > > > > Is there a publicly accessible SVN server for recommended packages (in > > general) and nlme (in particular) anywhere? > > nlme's SVN is physically at the same place as the R sources > (here at ETH Zurich), with URL > >https://svn.r-project.org/R-packages/trunk/nlme > > in addition to 'nlme', at least 'foreign', 'mgcv' and > 'cluster' are also maintained there. > > Thank you for the question: > I do think "we" should add the corresponding svn URL to the > respective DESCRIPTION file. > > OTOH, 'Matrix' has moved to R-forge a while ago .. and I'm > currently also not sure about the other Recommended packages > such as 'KernSmooth' or 'boot' . > > Best, > Martin > > Martin Maechler > ETH Zurich and R core team > Index: nlme/DESCRIPTION === --- nlme/DESCRIPTION(revision 7616) +++ nlme/DESCRIPTION(working copy) @@ -21,3 +21,4 @@ Encoding: UTF-8 License: GPL (>= 2) | file LICENCE BugReports: https://bugs.r-project.org +URL: https://svn.r-project.org/R-packages/trunk/nlme \ No newline at end of file Index: nlme/man/anova.lme.Rd === --- nlme/man/anova.lme.Rd (revision 7616) +++ nlme/man/anova.lme.Rd (working copy) @@ -61,7 +61,7 @@ } \description{ When only one fitted model object is present, a data frame with the - sums of squares, numerator degrees of freedom, denominator degrees of + numerator degrees of freedom, denominator degrees of freedom, F-values, and P-values for Wald tests for the terms in the model (when \code{Terms} and \code{L} are \code{NULL}), a combination of model terms (when \code{Terms} in not \code{NULL}), or linear @@ -71,7 +71,7 @@ log-likelihood, the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC) of each object is returned. If \code{test=TRUE}, whenever two consecutive objects have different - number of degrees of freedom, a likelihood ratio statistic, with the + number of degrees of freedom, a likelihood ratio statistic with the associated p-value is included in the returned data frame. } \value{ Index: nlme/man/lme.Rd === --- nlme/man/lme.Rd (revision 7616) +++ nlme/man/lme.Rd (working copy) @@ -117,8 +117,8 @@ (1982). The variance-covariance parametrizations are described in Pinheiro and Bates (1996). The different correlation structures available for the \code{correlation} argument are described in Box, - Jenkins and Reinse (1994), Littel \emph{et al} (1996), and Venables and - Ripley, (2002). The use of variance functions for linear and nonlinear + Jenkins and Reinsel (1994), Littell \emph{et al} (1996), and Venables and + Ripley (2002). The use of variance functions for linear and nonlinear mixed effects models is presented in detail in Davidian and Giltinan (1995). @@ -136,7 +136,7 @@ Data", Journal of the American Statistical Association, 83, 10
Re: [Rd] pmax and long vector
On 21/01/2019 12:35 p.m., Kasper Daniel Hansen wrote: I see that base::pmax() does not support long vectors. Is R-devel interested in reports like this; ie. is there a goal of full support for long vectors in "basic" functions, something I at least would greatly appreciate? MRE: pmax(rep(1L, 3*10^9), 0) Error in pmax(rep(1L, 3 * 10^9), 0) : long vectors not supported yet: ../../../R-devel-src/src/include/Rinlinedfuns.h:522 I think a carefully tested patch that fixes pmax (it would need to change this call from length() to xlength(), and make some other necessary changes that follow from this), would probably be useful to R Core, and could be posted to bugs.r-project.org. It might also be useful on R-devel to post a list of all known commonly used functions that don't support long vectors; this could be updated on a regular basis. This might encourage people to produce patches as above. I'm not so sure a report about a single function won't just get lost. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] pmax and long vector
Kasper, If you're not interested or dont have time to create said patch yourself let me know and i can do it. Best, ~G On Mon, Jan 21, 2019, 11:36 AM Duncan Murdoch On 21/01/2019 12:35 p.m., Kasper Daniel Hansen wrote: > > I see that base::pmax() does not support long vectors. > > > > Is R-devel interested in reports like this; ie. is there a goal of full > > support for long vectors in "basic" functions, something I at least would > > greatly appreciate? > > > > MRE: > > > >> pmax(rep(1L, 3*10^9), 0) > > > > Error in pmax(rep(1L, 3 * 10^9), 0) : > >long vectors not supported yet: > > ../../../R-devel-src/src/include/Rinlinedfuns.h:522 > > > I think a carefully tested patch that fixes pmax (it would need to > change this call from length() to xlength(), and make some other > necessary changes that follow from this), would probably be useful to R > Core, and could be posted to bugs.r-project.org. > > It might also be useful on R-devel to post a list of all known commonly > used functions that don't support long vectors; this could be updated on > a regular basis. This might encourage people to produce patches as above. > > I'm not so sure a report about a single function won't just get lost. > > Duncan Murdoch > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] pmax and long vector
Gabe, I don't (yet) know much about long vectors at the C level. So feel free to address this. Duncan, I'll see what I can do regarding systematically compiling a list of functions without long vector support. These days I frequently work with big enough matrices that I need it. On Mon, Jan 21, 2019 at 3:09 PM Gabriel Becker wrote: > Kasper, > > If you're not interested or dont have time to create said patch yourself > let me know and i can do it. > > Best, > ~G > > On Mon, Jan 21, 2019, 11:36 AM Duncan Murdoch wrote: > >> On 21/01/2019 12:35 p.m., Kasper Daniel Hansen wrote: >> > I see that base::pmax() does not support long vectors. >> > >> > Is R-devel interested in reports like this; ie. is there a goal of full >> > support for long vectors in "basic" functions, something I at least >> would >> > greatly appreciate? >> > >> > MRE: >> > >> >> pmax(rep(1L, 3*10^9), 0) >> > >> > Error in pmax(rep(1L, 3 * 10^9), 0) : >> >long vectors not supported yet: >> > ../../../R-devel-src/src/include/Rinlinedfuns.h:522 >> >> >> I think a carefully tested patch that fixes pmax (it would need to >> change this call from length() to xlength(), and make some other >> necessary changes that follow from this), would probably be useful to R >> Core, and could be posted to bugs.r-project.org. >> >> It might also be useful on R-devel to post a list of all known commonly >> used functions that don't support long vectors; this could be updated on >> a regular basis. This might encourage people to produce patches as above. >> >> I'm not so sure a report about a single function won't just get lost. >> >> Duncan Murdoch >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel