Re: [Rd] Objectsize function visiting every element for alt-rep strings

2019-01-21 Thread Martin Maechler
> Travers Ching 
> on Tue, 15 Jan 2019 12:50:45 -0800 writes:

> I have a toy alt-rep string package that generates
> randomly seeded strings.  example: library(altstringisode)
> x <- altrandomStrings(1e8) head(x) [1]
> "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1"
> "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... etc object.size(1e8)

> Object.size will call the set_altstring_Elt_method for
> every single element, materializing (slowly) every element
> of the vector.  This is a problem mostly in R-studio since
> object.size is called automatically, defeating the purpose
> of alt-rep.

Hmm.  But still, the idea had been that object.size()  *shuld*
return the size of the "de-ALTREP'ed" object *but* should not
de-ALTREP it.
That's what happens for integers, but indeed fails to happen for
such as.character(.)ed integers.

>From my eRum presentation (which took from the official ALTREP documentation
https://svn.r-project.org/R/branches/ALTREP/ALTREP.html ) :

  > x <- 1:1e15
  > object.size(x) # 8000'000'000'000'048 bytes : 8000 TBytes -- ok, not really
  8048 bytes
  > is.unsorted(x) # FALSE : i.e., R's *knows* it is sorted
  [1] FALSE
  > xs <- sort(x)  #
  > .Internal(inspect(x))
  @80255f8 14 REALSXP g0c0 [NAM(7)]  1 : 1000 (compact)
  > 

  > cx <- as.character(x)
  > .Internal(inspect(cx))
  @80485d8 16 STRSXP g0c0 [NAM(1)]   
@80255f8 14 REALSXP g1c0 [MARK,NAM(7)]  1 : 1000 (compact)
  > system.time( print(object.size(x)), gc=FALSE)
  8048 bytes
 user  system elapsed 
0.000   0.000   0.001 
  > system.time( print(object.size(cx)), gc=FALSE)
  Error: cannot allocate vector of size 8388608.0 Gb
  Timing stopped at: 11.43 0 11.46
  > 

One could consider it a bug that object.size(cx) is indeed
inspecting every string, i.e., accessing cx[i] for all i.
Note that it is *not*  deALTREPing cx  itself :

> x <- 1:1e6
> cx <- as.character(x)
> .Internal(inspect(cx))

@7f5b1a0 16 STRSXP g0c0 [NAM(1)]   
  @7f5adb0 13 INTSXP g0c0 [NAM(7)]  1 : 100 (compact)
> system.time( print(object.size(cx)), gc=FALSE)
6448 bytes
   user  system elapsed 
  0.369   0.005   0.374 
> .Internal(inspect(cx))
@7f5b1a0 16 STRSXP g0c0 [NAM(7)]   
  @7f5adb0 13 INTSXP g0c0 [NAM(7)]  1 : 100 (compact)
> 

> Is there a way to avoid the problem of forced
> materialization in rstudio?

> PS: Is there a way to tell if a post has been received by
> the mailing list?  How long does it take to show up in the
> archives?

[ that (waiting time) distribution is quite right skewed... I'd
  guess it's median to be less than 10 minutes... but we had
  artificially delayed it somewhat in the past to fight
  spammers, and ETH (the hosting instituttion) and others have
  increased spam and virus filtering so everything has become
  quite a bit slower ]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] long-standing documentation bug in ?anova.lme

2019-01-21 Thread Martin Maechler
> Ben Bolker 
> on Thu, 17 Jan 2019 12:32:20 -0500 writes:

> tl;dr anova.lme() claims to provide sums of squares, but it doesn't. And
> some names are misspelled in ?lme.  I can submit all this stuff as a bug
> report if that's preferred.

> ?anova.lme says:

> When only one fitted model object is present, a data frame with
> the sums of squares, numerator degrees of freedom, denominator
> degrees of freedom, F-values, and P-values

> The output of

> fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age
> anova(fm1)

> gives columns

> numDF denDF   F-value p-value

> -- i.e. the sums of squares aren't there!  (For fairly good reasons; lme
> doesn't actually compute them internally, and it might not always be
> straightforward to compute them, for more complex models. They would
> mostly be useful for comparison with simpler, method-of-moments based
> approaches like aov()). Federico Calboli pointed this out on r-help in
> 2004: https://stat.ethz.ch/pipermail/r-help/2004-May/051444.html


> Two more points:

> * the last sentence of the Description might need one fewer comma
> [after "statistic"] or one more [after "p-value"].
> * in ?lme, Littell's name is misspelled at least twice and Reinsel's
> at least once.

We'd be grateful for patches, thank you Ben!

Notably for 'nlme' and 'foreign', both of which are maintained
by R-core (rather than individual R core or R Foundation
members) we've also encouraged that  R's bugzilla be used for
non-trivial bug reports as that allows attached patches and
simple references too. 


> Is there a publicly accessible SVN server for recommended packages (in
> general) and nlme (in particular) anywhere?

nlme's SVN is physically at the same place as the R sources
(here at ETH Zurich), with URL

   https://svn.r-project.org/R-packages/trunk/nlme

in addition to 'nlme', at least  'foreign', 'mgcv'  and
'cluster' are also maintained there.

Thank you for the question:
 I do think "we" should add the corresponding  svn URL to the
 respective DESCRIPTION file.

OTOH, 'Matrix' has moved to R-forge a while ago .. and I'm
currently also not sure about the other Recommended packages
such as 'KernSmooth' or 'boot' . 

Best,
Martin

Martin Maechler
ETH Zurich and R core team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] orderVector1 (sort.c): Tiny improvement concerning nalast

2019-01-21 Thread EMILIO TORRES MANZANERA

Dear Sir,

In the functions  orderVector1, orderVector1l  (R-3.5.2/src/main/sort.c) there 
are two loops concerning nalast (lines 1096, 1105). I am not sure about the 
possibility of redefining them, so this function should be a little faster.

The first one (line 1096) can be included in the previous 'switch' block (line 
1079) (see below). And if you rewrite/duplicate this 'switch' block (line 1079) 
for the case nalast==false, you should be able to avoid the loop of line 1105.

Best regards,
Emilio



*** /home/emilio/Descargas/R-3.5.2/src/main/sort.c  2018-11-07 
00:15:02.0 +0100
--- /home/emilio/Descargas/R-3.5.2/src/main/sort2.c 2019-01-21 
11:13:07.414332755 +0100
***
*** 1079,1099 
switch (TYPEOF(key)) {
case LGLSXP:
case INTSXP:
!   for (i = 0; i < n; i++) isna[i] = (ix[i] == NA_INTEGER);
!   break;
case REALSXP:
!   for (i = 0; i < n; i++) isna[i] = ISNAN(x[i]);
!   break;
case STRSXP:
!   for (i = 0; i < n; i++) isna[i] = (sx[i] == NA_STRING);
!   break;
case CPLXSXP:
!   for (i = 0; i < n; i++) isna[i] = ISNAN(cx[i].r) || ISNAN(cx[i].i);
!   break;
default:
!   UNIMPLEMENTED_TYPE("orderVector1", key);
}
!   for (i = 0; i < n; i++) numna += isna[i];
  
if(numna)
switch (TYPEOF(key)) {
--- 1079, 
switch (TYPEOF(key)) {
case LGLSXP:
case INTSXP:
! for (i = 0; i < n; i++) {
!   isna[i] = (ix[i] == NA_INTEGER);
!   numna += isna[i];
! }
! break;
case REALSXP:
! for (i = 0; i < n; i++){
!   isna[i] = ISNAN(x[i]);
!   numna += isna[i];
! } 
! break;
case STRSXP:
! for (i = 0; i < n; i++){
!   isna[i] = (sx[i] == NA_STRING);
!   numna += isna[i];
! } 
! break;
case CPLXSXP:
! for (i = 0; i < n; i++){
!   isna[i] = ISNAN(cx[i].r) || ISNAN(cx[i].i);
!   numna += isna[i];
! } 
! break;
default:
! UNIMPLEMENTED_TYPE("orderVector1", key);
}
!   /*  for (i = 0; i < n; i++) numna += isna[i]; */
  
if(numna)
switch (TYPEOF(key)) {




-- 
=
Emilio Torres Manzanera
Fac. de Comercio - Universidad de Oviedo
c/ Luis Moya 261, E-33203 Gijón (Spain)
Tel. 985 182 197 email: tor...@uniovi.es
=
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] pmax and long vector

2019-01-21 Thread Kasper Daniel Hansen
I see that base::pmax() does not support long vectors.

Is R-devel interested in reports like this; ie. is there a goal of full
support for long vectors in "basic" functions, something I at least would
greatly appreciate?

MRE:

> pmax(rep(1L, 3*10^9), 0)

Error in pmax(rep(1L, 3 * 10^9), 0) :
  long vectors not supported yet:
../../../R-devel-src/src/include/Rinlinedfuns.h:522

Best,
Kasper

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] long-standing documentation bug in ?anova.lme

2019-01-21 Thread Ben Bolker

  Here are relevant patches to address the various issues described
below.  Thanks for the SVN info!

  cheers
Ben Bolker


On 2019-01-21 4:54 a.m., Martin Maechler wrote:
>> Ben Bolker 
>> on Thu, 17 Jan 2019 12:32:20 -0500 writes:
> 
> > tl;dr anova.lme() claims to provide sums of squares, but it doesn't. And
> > some names are misspelled in ?lme.  I can submit all this stuff as a bug
> > report if that's preferred.
> 
> > ?anova.lme says:
> 
> > When only one fitted model object is present, a data frame with
> > the sums of squares, numerator degrees of freedom, denominator
> > degrees of freedom, F-values, and P-values
> 
> > The output of
> 
> > fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age
> > anova(fm1)
> 
> > gives columns
> 
> > numDF denDF   F-value p-value
> 
> > -- i.e. the sums of squares aren't there!  (For fairly good reasons; lme
> > doesn't actually compute them internally, and it might not always be
> > straightforward to compute them, for more complex models. They would
> > mostly be useful for comparison with simpler, method-of-moments based
> > approaches like aov()). Federico Calboli pointed this out on r-help in
> > 2004: https://stat.ethz.ch/pipermail/r-help/2004-May/051444.html
> 
> 
> > Two more points:
> 
> > * the last sentence of the Description might need one fewer comma
> > [after "statistic"] or one more [after "p-value"].
> > * in ?lme, Littell's name is misspelled at least twice and Reinsel's
> > at least once.
> 
> We'd be grateful for patches, thank you Ben!
> 
> Notably for 'nlme' and 'foreign', both of which are maintained
> by R-core (rather than individual R core or R Foundation
> members) we've also encouraged that  R's bugzilla be used for
> non-trivial bug reports as that allows attached patches and
> simple references too. 
> 
> 
> > Is there a publicly accessible SVN server for recommended packages (in
> > general) and nlme (in particular) anywhere?
> 
> nlme's SVN is physically at the same place as the R sources
> (here at ETH Zurich), with URL
> 
>https://svn.r-project.org/R-packages/trunk/nlme
> 
> in addition to 'nlme', at least  'foreign', 'mgcv'  and
> 'cluster' are also maintained there.
> 
> Thank you for the question:
>  I do think "we" should add the corresponding  svn URL to the
>  respective DESCRIPTION file.
> 
> OTOH, 'Matrix' has moved to R-forge a while ago .. and I'm
> currently also not sure about the other Recommended packages
> such as 'KernSmooth' or 'boot' . 
> 
> Best,
> Martin
> 
> Martin Maechler
> ETH Zurich and R core team
> 
Index: nlme/DESCRIPTION
===
--- nlme/DESCRIPTION(revision 7616)
+++ nlme/DESCRIPTION(working copy)
@@ -21,3 +21,4 @@
 Encoding: UTF-8
 License: GPL (>= 2) | file LICENCE
 BugReports: https://bugs.r-project.org
+URL: https://svn.r-project.org/R-packages/trunk/nlme
\ No newline at end of file
Index: nlme/man/anova.lme.Rd
===
--- nlme/man/anova.lme.Rd   (revision 7616)
+++ nlme/man/anova.lme.Rd   (working copy)
@@ -61,7 +61,7 @@
 }
 \description{
   When only one fitted model object is present, a data frame with the
-  sums of squares, numerator degrees of freedom, denominator degrees of
+  numerator degrees of freedom, denominator degrees of
   freedom, F-values, and P-values for Wald tests for the terms in the
   model (when \code{Terms} and \code{L} are \code{NULL}), a combination
   of model terms (when \code{Terms} in not \code{NULL}), or linear
@@ -71,7 +71,7 @@
   log-likelihood, the Akaike Information Criterion (AIC), and the
   Bayesian Information Criterion (BIC) of each object is returned.  If
   \code{test=TRUE}, whenever two consecutive  objects have different
-  number of degrees of freedom, a likelihood ratio statistic, with the
+  number of degrees of freedom, a likelihood ratio statistic with the
   associated p-value is included in the returned data frame.
 }
 \value{
Index: nlme/man/lme.Rd
===
--- nlme/man/lme.Rd (revision 7616)
+++ nlme/man/lme.Rd (working copy)
@@ -117,8 +117,8 @@
   (1982).  The variance-covariance parametrizations are described in
   Pinheiro and Bates (1996).  The different correlation structures
   available for the \code{correlation} argument are described in Box,
-  Jenkins and Reinse (1994), Littel \emph{et al} (1996), and Venables and
-  Ripley, (2002). The use of variance functions for linear and nonlinear
+  Jenkins and Reinsel (1994), Littell \emph{et al} (1996), and Venables and
+  Ripley (2002). The use of variance functions for linear and nonlinear
   mixed effects models is presented in detail in Davidian and Giltinan
   (1995).
 
@@ -136,7 +136,7 @@
   Data", Journal of the American Statistical Association, 83,
   10

Re: [Rd] pmax and long vector

2019-01-21 Thread Duncan Murdoch

On 21/01/2019 12:35 p.m., Kasper Daniel Hansen wrote:

I see that base::pmax() does not support long vectors.

Is R-devel interested in reports like this; ie. is there a goal of full
support for long vectors in "basic" functions, something I at least would
greatly appreciate?

MRE:


pmax(rep(1L, 3*10^9), 0)


Error in pmax(rep(1L, 3 * 10^9), 0) :
   long vectors not supported yet:
../../../R-devel-src/src/include/Rinlinedfuns.h:522



I think a carefully tested patch that fixes pmax (it would need to 
change this call from length() to xlength(), and make some other 
necessary changes that follow from this), would probably be useful to R 
Core, and could be posted to bugs.r-project.org.


It might also be useful on R-devel to post a list of all known commonly 
used functions that don't support long vectors; this could be updated on 
a regular basis.  This might encourage people to produce patches as above.


I'm not so sure a report about a single function won't just get lost.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] pmax and long vector

2019-01-21 Thread Gabriel Becker
Kasper,

If you're not interested or dont have time to create said patch yourself
let me know and i can do it.

Best,
~G

On Mon, Jan 21, 2019, 11:36 AM Duncan Murdoch  On 21/01/2019 12:35 p.m., Kasper Daniel Hansen wrote:
> > I see that base::pmax() does not support long vectors.
> >
> > Is R-devel interested in reports like this; ie. is there a goal of full
> > support for long vectors in "basic" functions, something I at least would
> > greatly appreciate?
> >
> > MRE:
> >
> >> pmax(rep(1L, 3*10^9), 0)
> >
> > Error in pmax(rep(1L, 3 * 10^9), 0) :
> >long vectors not supported yet:
> > ../../../R-devel-src/src/include/Rinlinedfuns.h:522
>
>
> I think a carefully tested patch that fixes pmax (it would need to
> change this call from length() to xlength(), and make some other
> necessary changes that follow from this), would probably be useful to R
> Core, and could be posted to bugs.r-project.org.
>
> It might also be useful on R-devel to post a list of all known commonly
> used functions that don't support long vectors; this could be updated on
> a regular basis.  This might encourage people to produce patches as above.
>
> I'm not so sure a report about a single function won't just get lost.
>
> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] pmax and long vector

2019-01-21 Thread Kasper Daniel Hansen
Gabe, I don't (yet) know much about long vectors at the C level. So feel
free to address this.

Duncan, I'll see what I can do regarding systematically compiling a list of
functions without long vector support. These days I frequently work with
big enough matrices that I need it.

On Mon, Jan 21, 2019 at 3:09 PM Gabriel Becker 
wrote:

> Kasper,
>
> If you're not interested or dont have time to create said patch yourself
> let me know and i can do it.
>
> Best,
> ~G
>
> On Mon, Jan 21, 2019, 11:36 AM Duncan Murdoch  wrote:
>
>> On 21/01/2019 12:35 p.m., Kasper Daniel Hansen wrote:
>> > I see that base::pmax() does not support long vectors.
>> >
>> > Is R-devel interested in reports like this; ie. is there a goal of full
>> > support for long vectors in "basic" functions, something I at least
>> would
>> > greatly appreciate?
>> >
>> > MRE:
>> >
>> >> pmax(rep(1L, 3*10^9), 0)
>> >
>> > Error in pmax(rep(1L, 3 * 10^9), 0) :
>> >long vectors not supported yet:
>> > ../../../R-devel-src/src/include/Rinlinedfuns.h:522
>>
>>
>> I think a carefully tested patch that fixes pmax (it would need to
>> change this call from length() to xlength(), and make some other
>> necessary changes that follow from this), would probably be useful to R
>> Core, and could be posted to bugs.r-project.org.
>>
>> It might also be useful on R-devel to post a list of all known commonly
>> used functions that don't support long vectors; this could be updated on
>> a regular basis.  This might encourage people to produce patches as above.
>>
>> I'm not so sure a report about a single function won't just get lost.
>>
>> Duncan Murdoch
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel