Re: [Rd] [bug] droplevels() also drop object attributes (comment…)

2017-05-16 Thread Martin Maechler
> Serge Bibauw 
> on Mon, 15 May 2017 11:59:32 -0400 writes:

> Hi,

> Just reporting a small bug… not really a big deal, but I don’t think that 
is intended: droplevels() also drops all object’s attributes.

Yes.  The help page for droplevels (or the simple definition of
'droplevels.factor') clearly indicate that the method for
factors is really just a call to   factor(x, exclude = *)

and that _is_ quite an important base function whose semantic
should not be changed lightly. Still, let's continue :

Looking a bit, I see that the current behavior of factor() {and
hence droplevels} has been unchanged in this respect  for the
whole history of R, well, at least for more than 17 years (R 1.0.1, April 2000).

I'd agree there _is_ a bug, at least in the documentation which
does *not* mention that currently, all attributes are dropped but "names",
"levels" (and "class").

OTOH, factor() would only need a small change to make it
preserve all attributes (but "class" and "levels" which are set explicitly).

I'm sure this will break some checks in some packages.
Is it worth it?

e.g., our own R  QC checks currently check (the printing of) the
following (in tests/reg-tests-2.R ):

> ## some tests of factor matrices
> A <- factor(7:12)
> dim(A) <- c(2, 3)
> A
 [,1] [,2] [,3]
[1,] 7911  
[2,] 810   12  
Levels: 7 8 9 10 11 12
> str(A)
 factor [1:2, 1:3] 7 8 9 10 ...
 - attr(*, "levels")= chr [1:6] "7" "8" "9" "10" ...
> A[, 1:2]
 [,1] [,2]
[1,] 79   
[2,] 810  
Levels: 7 8 9 10 11 12
> A[, 1:2, drop=TRUE]
[1] 7  8  9  10
Levels: 7 8 9 10

with the proposed change to factor(),
the last call would change its result:

> A[, 1:2, drop=TRUE]
 [,1] [,2]
[1,] 79   
[2,] 810  
Levels: 7 8 9 10

because 'drop=TRUE' calls factor(..) and that would also
preserve the "dim" attribute.
I would think that the changed behavior _is_ better, and is also
according to documentation, because the help page for
 [.factor
explains that 'drop = TRUE' drops levels, but _not_ that it
transforms a factor matrix into a factor (vector).


Martin


> Example:

>> > test <- c("hello", "something", "hi")
>> > test <- factor(test)
>> > comment(test) <- "this is a test"
>> > attr(test, "description") <- "this is another test"
>> > attributes(test)
>> $levels
>> [1] "hello"     "hi"        "something"
>> 
>> $class
>> [1] "factor"
>> 
>> $comment
>> [1] "this is a test"
>> 
>> $description
>> [1] "this is another test"
>> 
>> > test <- droplevels(test)
>> > attributes(test)
>> $levels
>> [1] "hello"     "hi"        "something"
>> 
>> $class
>> [1] "factor"


> Serge

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread Martin Maechler
> Hervé Pagès 
> on Mon, 15 May 2017 16:54:46 -0700 writes:

> Hi,
> On 05/15/2017 10:41 AM, luke-tier...@uiowa.edu wrote:
>> This is getting pretty convoluted.
>> 
>> The current behavior is consistent with the description at the top of
>> the help page -- it does not promise to stop evaluation once the first
>> non-TRUE is found.  That seems OK to me -- if you want sequencing you
>> can use
>> 
>> stopifnot(A)
>> stopifnot(B)
>> 
>> or
>> 
>> stopifnot(A && B)

> My main use case for using stopifnot() is argument checking. In that
> context, I like the conciseness of

> stopifnot(
> A,
> B,
> ...
> )

> I think it's a common use case (and a pretty natural thing to do) to
> order/organize the expressions in a way such that it only makes sense
> to continue evaluating if all was OK so far e.g.

> stopifnot(
> is.numeric(x),
> length(x) == 1,
> is.na(x)
> )

I agree.  And that's how I have used stopifnot() in many cases
myself, sometimes even more "extremely" than the above example,
using assertions that only make sense if previous assertions
were fulfilled, such as

stopifnot(is.numeric(n), length(n) == 1, n == round(n), n >= 0)

or in the Matrix package, first checking some class properties
and then things that only make sense for objects with those properties.


> At least that's how things are organized in the stopifnot() calls that
> accumulated in my code over the years. That's because I was convinced
> that evaluation would stop at the first non-true expression (as
> suggested by the man page). Until recently when I got a warning issued
> by an expression located *after* the first non-true expression. This
> was pretty unexpected/confusing!

> If I can't rely on this "sequencing" feature, I guess I can always
> do

> stopifnot(A)
> stopifnot(B)
> ...

> but I loose the conciseness of calling stopifnot() only once.
> I could also use

> stopifnot(A && B && ...)

> but then I loose the conciseness of the error message i.e. it's going
> to be something like

> Error: A && B && ... is not TRUE

> which can be pretty long/noisy compared to the message that reports
> only the 1st error.


> Conciseness/readability of the single call to stopifnot() and
> conciseness of the error message are the features that made me
> adopt stopifnot() in the 1st place. 

Yes, and that had been my design goal when I created it.

I do tend agree with  Hervé and Serguei here.

> If stopifnot() cannot be revisited
> to do "sequencing" then that means I will need to revisit all my calls
> to stopifnot().

>> 
>> I could see an argument for a change that in the multiple argumetn
>> case reports _all_ that fail; that would seem more useful to me than
>> twisting the code into knots.

Interesting... but really differing from the current documentation,

> Why not. Still better than the current situation. But only if that
> semantic seems more useful to people. Would be sad if usefulness
> of one semantic or the other was decided based on trickiness of
> implementation.

Well, the trickiness  should definitely play a role.
Apart from functionality and semantics, long term maintenance
and code readibility, even elegance have shown to be very
important aspects of good code in ca 30 years of S and R programming.

OTOH, as mentioned above, the creation of good error messages
has been an important design goal of  stopifnot()  and hence I'm
willing to accept the extra complexity of "patching up" the call
used in the error / warning messages.

Also, as a change to what I posted yesterday, I now plan to follow
Peter Dalgaard's suggestion of using
 eval( .. ) 
instead of   eval(cl[[i]], envir = )
as there may be cases where the former behaves better in lazy
evaluation situations.
(Other opinions on that ?)

Martin

> Thanks,
> H.

>> 
>> Best,
>> 
>> luke
>> 
>> On Mon, 15 May 2017, Martin Maechler wrote:
>> 
 Serguei Sokol 
 on Mon, 15 May 2017 16:32:20 +0200 writes:
>>> 
>>> > Le 15/05/2017 à 15:37, Martin Maechler a écrit :
>>> >>> Serguei Sokol 
>>> >>> on Mon, 15 May 2017 13:14:34 +0200 writes:
>>> >> > I see in the archives that the attachment cannot pass.
>>> >> > So, here is the code:
>>> >>
>>> >> [... MM: I needed to reformat etc to match closely to
>>> >> the current source code which is in
>>> >>
>>> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.r-2Dproject.org_R_trunk_src_library_base_R_stop.R&d=DwIFAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=t9fJDOl9YG2zB-GF0wQXrXJTsW2jxTxMHE-qZfLGzHU&s=KGsvpXrXpHCFTdbLM9ci3sBNO9C3ocsgEqHMvZKvV9I&e=
>>> >> or its corresponding github mirror
>>> >>
>>>

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread Serguei Sokol

Le 15/05/2017 à 19:41, luke-tier...@uiowa.edu a écrit :

This is getting pretty convoluted.

The current behavior is consistent with the description at the top of
the help page -- it does not promise to stop evaluation once the first
non-TRUE is found.

Hm... we can read in the man page :
‘stopifnot(A, B)’ is conceptually equivalent to

  { if(any(is.na(A)) || !all(A)) stop(...);
if(any(is.na(B)) || !all(B)) stop(...) }
and this behavior does promise to stop at first non-TRUE value
without evaluation of the rest of conditions.

Sergueï.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Wish for arima function: add a data argument and a formula-type for regressors

2017-05-16 Thread Olivier Renaud
Hi,

Using arima on data that are in a data frame, especially when adding 
xreg, would be much easier if the arima function contained

1) a "data=" argument

2) the possibility to include the covariate(s) in a formula style.

Ideally the call could be something like

 > arima(symptome, order=c(1,0,0), xreg=~trait01*mesure0, data=anxiete)

( or arima(symptome~trait01*mesure0, order=c(1,0,0), data=anxiete)   )

instead of present:

 > anxiete$interact = anxiete$trait01*anxiete$mesure0
 > arima(anxiete$symptome, order=c(1,0,0), xreg=anxiete[, c("trait01", 
"mesure0", "interact")])


Background: Especially in psychology, so-called single case analyses 
consist often in a the interaction effect of treatment and usual 
training effect, with typically arma type of error, resulting in the 
above model. Typically, all the needed data are in a data.frame .

An additional advantage concerns the names of the coefficient in the 
output: if only one regressor:

>arima(anxiete$symptome, order=c(1,0,0), xreg=anxiete[, c("trait01")]) [...]
Coefficients:
  ar1  intercept  anxiete[, c("trait01")]
   0.564933.8623  -8.1225
s.e.  0.1073 0.5969   0.8052

but the name convention changes with several regressors:

>arima(anxiete$symptome, order=c(1,0,0), xreg=anxiete[, c("trait01", 
"mesure0", "interact")]) [...]
Coefficients:
  ar1  intercept  trait01  mesure0  interact
   0.271534.1363  -5.5777   0.0075   -0.1809
s.e.  0.1211 0.6685   0.9009   0.03420.0490



-- 
Prof. Olivier Renaud  http://www.unige.ch/fapse/mad/
Methodology & Data Analysis - Psychology Dept - University of Geneva
UniMail, Office 4138  -  40, Bd du Pont d'Arve   -  CH-1211 Geneva 4


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] tweaking Sys.timezone()

2017-05-16 Thread Serguei Sokol

Hi,

On my system (Linux, Mageia 5) Sys.timezone() returns NA but
with a minor tweak it could work as expected, i.e. returning "Europe/Paris".

Here is the problem. At some moment it does

 lt <- normalizePath("/etc/localtime")

On my system /etc/localtime is a symlink pointing to
/usr/share/zoneinfo/Europe/Paris. So far so good.
With the next two operations the good answer should be found:

 if (grepl(pat <- "^/usr/share/zoneinfo/", lt)) sub(pat, "", lt)

Unfortunately, on my system "/usr/share" is also a simlink
so lt resolves to "/home/local/usr_share/zoneinfo/Europe/Paris"
and not to "/usr/share/zoneinfo/Europe/Paris".
So the test above fails.
As the keyword in this story is zoneinfo, could we modify the
pat to look as

  if (grepl(pat <- "^.*/zoneinfo/", lt)) sub(pat, "", lt)

?
In this way, we don't make assumption where exactly
"zoneinfo/*" resides. We have found it, no matter where, so use it.

Hoping it could find its way into a next R release.

Best,
Serguei.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread luke-tierney

On Tue, 16 May 2017, Serguei Sokol wrote:


Le 15/05/2017 à 19:41, luke-tier...@uiowa.edu a écrit :

This is getting pretty convoluted.

The current behavior is consistent with the description at the top of
the help page -- it does not promise to stop evaluation once the first
non-TRUE is found.

Hm... we can read in the man page :
‘stopifnot(A, B)’ is conceptually equivalent to

 { if(any(is.na(A)) || !all(A)) stop(...);
   if(any(is.na(B)) || !all(B)) stop(...) }
and this behavior does promise to stop at first non-TRUE value
without evaluation of the rest of conditions.


Yes: that is why I explicitly referenced the description at the top of
the page.

Changing the 'conceptually equivalent' bit to reflect what is
happening is easy.  The changes being discussed, and their long term
maintenance, ar not.

Best,

luke




Sergueï.



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread luke-tierney

On Tue, 16 May 2017, Martin Maechler wrote:


Hervé Pagès 
on Mon, 15 May 2017 16:54:46 -0700 writes:


   > Hi,
   > On 05/15/2017 10:41 AM, luke-tier...@uiowa.edu wrote:
   >> This is getting pretty convoluted.
   >>
   >> The current behavior is consistent with the description at the top of
   >> the help page -- it does not promise to stop evaluation once the first
   >> non-TRUE is found.  That seems OK to me -- if you want sequencing you
   >> can use
   >>
   >> stopifnot(A)
   >> stopifnot(B)
   >>
   >> or
   >>
   >> stopifnot(A && B)

   > My main use case for using stopifnot() is argument checking. In that
   > context, I like the conciseness of

   > stopifnot(
   > A,
   > B,
   > ...
   > )

   > I think it's a common use case (and a pretty natural thing to do) to
   > order/organize the expressions in a way such that it only makes sense
   > to continue evaluating if all was OK so far e.g.

   > stopifnot(
   > is.numeric(x),
   > length(x) == 1,
   > is.na(x)
   > )

I agree.  And that's how I have used stopifnot() in many cases
myself, sometimes even more "extremely" than the above example,
using assertions that only make sense if previous assertions
were fulfilled, such as

   stopifnot(is.numeric(n), length(n) == 1, n == round(n), n >= 0)

or in the Matrix package, first checking some class properties
and then things that only make sense for objects with those properties.


   > At least that's how things are organized in the stopifnot() calls that
   > accumulated in my code over the years. That's because I was convinced
   > that evaluation would stop at the first non-true expression (as
   > suggested by the man page). Until recently when I got a warning issued
   > by an expression located *after* the first non-true expression. This
   > was pretty unexpected/confusing!

   > If I can't rely on this "sequencing" feature, I guess I can always
   > do

   > stopifnot(A)
   > stopifnot(B)
   > ...

   > but I loose the conciseness of calling stopifnot() only once.
   > I could also use

   > stopifnot(A && B && ...)

   > but then I loose the conciseness of the error message i.e. it's going
   > to be something like

   > Error: A && B && ... is not TRUE

   > which can be pretty long/noisy compared to the message that reports
   > only the 1st error.


   > Conciseness/readability of the single call to stopifnot() and
   > conciseness of the error message are the features that made me
   > adopt stopifnot() in the 1st place.

Yes, and that had been my design goal when I created it.

I do tend agree with  Hervé and Serguei here.

   > If stopifnot() cannot be revisited
   > to do "sequencing" then that means I will need to revisit all my calls
   > to stopifnot().

   >>
   >> I could see an argument for a change that in the multiple argumetn
   >> case reports _all_ that fail; that would seem more useful to me than
   >> twisting the code into knots.

Interesting... but really differing from the current documentation,

   > Why not. Still better than the current situation. But only if that
   > semantic seems more useful to people. Would be sad if usefulness
   > of one semantic or the other was decided based on trickiness of
   > implementation.

Well, the trickiness  should definitely play a role.
Apart from functionality and semantics, long term maintenance
and code readibility, even elegance have shown to be very
important aspects of good code in ca 30 years of S and R programming.

OTOH, as mentioned above, the creation of good error messages
has been an important design goal of  stopifnot()  and hence I'm
willing to accept the extra complexity of "patching up" the call
used in the error / warning messages.

Also, as a change to what I posted yesterday, I now plan to follow
Peter Dalgaard's suggestion of using
eval( .. )
instead of   eval(cl[[i]], envir = )
as there may be cases where the former behaves better in lazy
evaluation situations.
(Other opinions on that ?)


If you go this route it would be useful to step back and think about
whether there might be some useful primitives to add to make this
easier, such as

- provide a dotsLength function for computing the number arguments
  captured in a ... argument

- providing a dotsElt function for extracting the i-the element
  instead of going through the eval(sprintf("..%d", i)) construct.

- maybe something for extracting the expression for the i-th argument.

The might be more generally useful and make the code more readable and
maintainable.

Best,

luke



Martin

   > Thanks,
   > H.

   >>
   >> Best,
   >>
   >> luke
   >>
   >> On Mon, 15 May 2017, Martin Maechler wrote:
   >>
    Serguei Sokol 
    on Mon, 15 May 2017 16:32:20 +0200 writes:
   >>>
   >>> > Le 15/05/2017 à 15:37, Martin Maechler a écrit :
   >>> >>> Serguei Sokol 
   >>> >>> on Mon, 15 May 2017 13:14:34 +0200 writes:
   >>> >> > I see in the archives that the attachment cannot pass.
   >>> >> > So, here is the co

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread Suharto Anggono Suharto Anggono via R-devel
switch(i, ...)
extracts 'i'-th argument in '...'. It is like
eval(as.name(paste0("..", i))) .

Just mentioning other things:
- For 'n',
n <- nargs()
can be used.
- sys.call() can be used in place of match.call() .
---
> peter dalgaard 
> on Mon, 15 May 2017 16:28:42 +0200 writes:

> I think Hervé's idea was just that if switch can evaluate arguments 
selectively, so can stopifnot(). But switch() is .Primitive, so does it from C. 

if he just meant that, then "yes, of course" (but not so interesting).

> I think it is almost a no-brainer to implement a sequential stopifnot if 
dropping to C code is allowed. In R it gets trickier, but how about this:

Something like this, yes, that's close to what Serguei Sokol had proposed
(and of course I *do*  want to keep the current sophistication
 of stopifnot(), so this is really too simple)

> Stopifnot <- function(...)
> {
> n <- length(match.call()) - 1
> for (i in 1:n)
> {
> nm <- as.name(paste0("..",i))
> if (!eval(nm)) stop("not all true")
> }
> }
> Stopifnot(2+2==4)
> Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!")
> Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!")
> Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T)


>> On 15 May 2017, at 15:37 , Martin Maechler  wrote:
>> 
>> I'm still curious about Hervé's idea on using  switch()  for the
>> issue.

> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread peter dalgaard

> On 16 May 2017, at 18:37 , Suharto Anggono Suharto Anggono via R-devel 
>  wrote:
> 
> switch(i, ...)
> extracts 'i'-th argument in '...'. It is like
> eval(as.name(paste0("..", i))) .

Hey, that's pretty neat! 

-pd

> 
> Just mentioning other things:
> - For 'n',
> n <- nargs()
> can be used.
> - sys.call() can be used in place of match.call() .
> ---
>> peter dalgaard 
>>on Mon, 15 May 2017 16:28:42 +0200 writes:
> 
>> I think Hervé's idea was just that if switch can evaluate arguments 
>> selectively, so can stopifnot(). But switch() is .Primitive, so does it from 
>> C. 
> 
> if he just meant that, then "yes, of course" (but not so interesting).
> 
>> I think it is almost a no-brainer to implement a sequential stopifnot if 
>> dropping to C code is allowed. In R it gets trickier, but how about this:
> 
> Something like this, yes, that's close to what Serguei Sokol had proposed
> (and of course I *do*  want to keep the current sophistication
> of stopifnot(), so this is really too simple)
> 
>> Stopifnot <- function(...)
>> {
>> n <- length(match.call()) - 1
>> for (i in 1:n)
>> {
>> nm <- as.name(paste0("..",i))
>> if (!eval(nm)) stop("not all true")
>> }
>> }
>> Stopifnot(2+2==4)
>> Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!")
>> Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!")
>> Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T)
> 
> 
>>> On 15 May 2017, at 15:37 , Martin Maechler  
>>> wrote:
>>> 
>>> I'm still curious about Hervé's idea on using  switch()  for the
>>> issue.
> 
>> -- 
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread Martin Maechler
>   
> on Tue, 16 May 2017 09:49:56 -0500 writes:

> On Tue, 16 May 2017, Martin Maechler wrote:
>>> Hervé Pagès 
>>> on Mon, 15 May 2017 16:54:46 -0700 writes:
>> 
>> > Hi,
>> > On 05/15/2017 10:41 AM, luke-tier...@uiowa.edu wrote:
>> >> This is getting pretty convoluted.
>> >>
>> >> The current behavior is consistent with the description at the top of
>> >> the help page -- it does not promise to stop evaluation once the first
>> >> non-TRUE is found.  That seems OK to me -- if you want sequencing you
>> >> can use
>> >>
>> >> stopifnot(A)
>> >> stopifnot(B)
>> >>
>> >> or
>> >>
>> >> stopifnot(A && B)
>> 
>> > My main use case for using stopifnot() is argument checking. In that
>> > context, I like the conciseness of
>> 
>> > stopifnot(
>> > A,
>> > B,
>> > ...
>> > )
>> 
>> > I think it's a common use case (and a pretty natural thing to do) to
>> > order/organize the expressions in a way such that it only makes sense
>> > to continue evaluating if all was OK so far e.g.
>> 
>> > stopifnot(
>> > is.numeric(x),
>> > length(x) == 1,
>> > is.na(x)
>> > )
>> 
>> I agree.  And that's how I have used stopifnot() in many cases
>> myself, sometimes even more "extremely" than the above example,
>> using assertions that only make sense if previous assertions
>> were fulfilled, such as
>> 
>> stopifnot(is.numeric(n), length(n) == 1, n == round(n), n >= 0)
>> 
>> or in the Matrix package, first checking some class properties
>> and then things that only make sense for objects with those properties.
>> 
>> 
>> > At least that's how things are organized in the stopifnot() calls that
>> > accumulated in my code over the years. That's because I was convinced
>> > that evaluation would stop at the first non-true expression (as
>> > suggested by the man page). Until recently when I got a warning issued
>> > by an expression located *after* the first non-true expression. This
>> > was pretty unexpected/confusing!
>> 
>> > If I can't rely on this "sequencing" feature, I guess I can always
>> > do
>> 
>> > stopifnot(A)
>> > stopifnot(B)
>> > ...
>> 
>> > but I loose the conciseness of calling stopifnot() only once.
>> > I could also use
>> 
>> > stopifnot(A && B && ...)
>> 
>> > but then I loose the conciseness of the error message i.e. it's going
>> > to be something like
>> 
>> > Error: A && B && ... is not TRUE
>> 
>> > which can be pretty long/noisy compared to the message that reports
>> > only the 1st error.
>> 
>> 
>> > Conciseness/readability of the single call to stopifnot() and
>> > conciseness of the error message are the features that made me
>> > adopt stopifnot() in the 1st place.
>> 
>> Yes, and that had been my design goal when I created it.
>> 
>> I do tend agree with  Hervé and Serguei here.
>> 
>> > If stopifnot() cannot be revisited
>> > to do "sequencing" then that means I will need to revisit all my calls
>> > to stopifnot().
>> 
>> >>
>> >> I could see an argument for a change that in the multiple argumetn
>> >> case reports _all_ that fail; that would seem more useful to me than
>> >> twisting the code into knots.
>> 
>> Interesting... but really differing from the current documentation,
>> 
>> > Why not. Still better than the current situation. But only if that
>> > semantic seems more useful to people. Would be sad if usefulness
>> > of one semantic or the other was decided based on trickiness of
>> > implementation.
>> 
>> Well, the trickiness  should definitely play a role.
>> Apart from functionality and semantics, long term maintenance
>> and code readibility, even elegance have shown to be very
>> important aspects of good code in ca 30 years of S and R programming.
>> 
>> OTOH, as mentioned above, the creation of good error messages
>> has been an important design goal of  stopifnot()  and hence I'm
>> willing to accept the extra complexity of "patching up" the call
>> used in the error / warning messages.
>> 
>> Also, as a change to what I posted yesterday, I now plan to follow
>> Peter Dalgaard's suggestion of using
>> eval( .. )
>> instead of   eval(cl[[i]], envir = )
>> as there may be cases where the former behaves better in lazy
>> evaluation situations.
>> (Other opinions on that ?)

> If you go this route it would be useful to step back and think about
> whether there might be some useful primitives to add to make this
> easier, such as

> - provide a dotsLength function for computing the number arguments
> captured in a ... argument

actually my current version did not use that

[Rd] Consider increasing the size of HSIZE

2017-05-16 Thread Jim Hester
The HSIZE constant, which sets the size of the hash table used to
store symbols is currently defined as `#define HSIZE 4119`. This value
was last increased in r5182 on 1999-07-15.

https://github.com/jimhester/hashsize#readme contains a code which
simulates a normal R workflow by loading a handful of packages. In the
example more than 20,000 symbols are included in the hash table,
resulting in a load factor of greater than 5. The histogram in the
linked repository shows the distribution of bucket sizes for the hash
table.

This high load factor means most queries into the hashtable result in
a collision, requiring an additional linear search of the linked list
for each bucket. Is is common for growable hash tables to increase
their size when the load factor is greater than .75, so I think it
would be of benefit to increase the HSIZE constant considerably; to
32768 or possibly 65536. This will result in increased memory
requirements for the hash table, but far fewer collisions.

To get an idea of the performance implications the repository includes
some benchmarks of looking up the first element in a given hash
bucket, and the last element (for buckets over 10 elements long). The
results are somewhat noisy. Because longer symbol names hashing the
name and performing string comparisons to searching the list tends to
dominate the time. But for symbols of similar length there is a 2X-4X
increase in lookup performance between retrieving the first element in
a bucket to retrieving the last (indicated by the `total` column in
the table).

Increasing the size of `HSIZE` seems like a easy way to improve the
performance of an operation that occurs thousands if not millions of
times for every R session, with very limited cost in memory.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread Hervé Pagès

On 05/16/2017 09:59 AM, peter dalgaard wrote:



On 16 May 2017, at 18:37 , Suharto Anggono Suharto Anggono via R-devel 
 wrote:

switch(i, ...)
extracts 'i'-th argument in '...'. It is like
eval(as.name(paste0("..", i))) .


Hey, that's pretty neat!


Indeed! Seems like this topic is even more connected to switch()
than I anticipated...

H.



-pd



Just mentioning other things:
- For 'n',
n <- nargs()
can be used.
- sys.call() can be used in place of match.call() .
---

peter dalgaard 
   on Mon, 15 May 2017 16:28:42 +0200 writes:



I think Hervé's idea was just that if switch can evaluate arguments 
selectively, so can stopifnot(). But switch() is .Primitive, so does it from C.


if he just meant that, then "yes, of course" (but not so interesting).


I think it is almost a no-brainer to implement a sequential stopifnot if 
dropping to C code is allowed. In R it gets trickier, but how about this:


Something like this, yes, that's close to what Serguei Sokol had proposed
(and of course I *do*  want to keep the current sophistication
of stopifnot(), so this is really too simple)


Stopifnot <- function(...)
{
n <- length(match.call()) - 1
for (i in 1:n)
{
nm <- as.name(paste0("..",i))
if (!eval(nm)) stop("not all true")
}
}
Stopifnot(2+2==4)
Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!")
Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!")
Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T)




On 15 May 2017, at 15:37 , Martin Maechler  
wrote:

I'm still curious about Hervé's idea on using  switch()  for the
issue.



--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com


__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=mLJLORFCunDiCafHllurGVVVHiMf85ExkM7B5DngfIk&s=helOsmplADBmY6Ct7r30onNuD8a6GKz6yuSgjPxljeU&e=




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] problem running test on a system without /etc/localtime

2017-05-16 Thread Kirill Maslinsky
Hi all, 

A problem with tests while building R.

I'm packaging R for Sisyphus repository and package build environment,
by design, doesn't have /etc/localtime file present. This causes failure
with Sys.timeone during test run:

[builder@localhost tests]$ ../bin/R --vanilla < reg-tests-1d.R

> ## PR#17186 - Sys.timezone() on some Debian-derived platforms
> (S.t <- Sys.timezone())
Error in normalizePath("/etc/localtime") : 
  (converted from warning) path[1]="/etc/localtime": No such file or
  directory
  Calls: Sys.timezone -> normalizePath
  Execution halted

This is caused by this code:

> Sys.timezone
function (location = TRUE) 
{
tz <- Sys.getenv("TZ", names = FALSE)
if (!location || nzchar(tz)) 
return(Sys.getenv("TZ", unset = NA_character_))
>>  lt <- normalizePath("/etc/localtime")
[remainder of the code skkipped]

File /etc/loclatime is optional and is not guaranteed to be present on
any platform. And anyway, it is a good idea to first check that file 
exists before calling normalizePath. 

Sure, this can be worked around by setting TZ environment variable, but
that causes tests to fail in another place:

[builder@localhost tests]$ TZ="GMT" ../bin/R --vanilla < reg-tests-1d.R

> ## format()ing invalid hand-constructed  POSIXlt  objects
> d <- as.POSIXlt("2016-12-06"); d$zone <- 1
> tools::assertError(format(d))
Error: Failed to get error in evaluating format(d)
Execution halted

It seems that the best solution will be to patch Sys.timezone.

-- 
KM

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] problem running test on a system without /etc/localtime

2017-05-16 Thread Dirk Eddelbuettel

On 17 May 2017 at 03:35, Kirill Maslinsky wrote:
| I'm packaging R for Sisyphus repository and package build environment,
| by design, doesn't have /etc/localtime file present. This causes failure
| with Sys.timeone during test run:
[...]
| It seems that the best solution will be to patch Sys.timezone.

The file-based approach was AFAIK never successfully standardized.

Setting a TZ is a defensible fallback.  At some point last year I got so
annoyed about this (and have the historical Debian attitude that a config
file may be preferable to a environment variable [ which I now think is wrong
for some things like TZ ]) I wrote the 'gettz' package.   Quick demo in a
Docker container with nothing set:

edd@max:~$ docker run --rm -ti r-base /bin/bash
root@f3848979cab4:/# echo $TZ
echo $TZ

root@f3848979cab4:/# R
R

R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> Sys.getenv("TZ")  # as expected
Sys.getenv("TZ")  # as expected
[1] ""
> install.packages("gettz")
install.packages("gettz")
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/gettz_0.0.3.tar.gz'
Content type 'application/x-gzip' length 9064 bytes
==
downloaded 9064 bytes

* installing *source* package ‘gettz’ ...
** package ‘gettz’ successfully unpacked and MD5 sums checked
** libs
g++  -I/usr/share/R/include -DNDEBUG  -fpic  -g -O2 
-fdebug-prefix-map=/build/r-base-3.4.0=. -fstack-protector-strong -Wformat 
-Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c gettz.cpp -o 
gettz.o
g++ -shared -L/usr/lib/R/lib -Wl,-z,relro -o gettz.so gettz.o -L/usr/lib/R/lib 
-lR
installing to /usr/local/lib/R/site-library/gettz/libs
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (gettz)

The downloaded source packages are in
‘/tmp/RtmpLvuVz8/downloaded_packages’
> gettz::gettz()
gettz::gettz()
[1] "Etc/UTC"
>


As I recall, R got patched for R 3.3.3 or R 3.4.0 to return "" in more cases.
gettz is a little smarter about looking in more locations that R was at the
time (and hence not dissimilar to what was suggested earlier today, but
operates at compiled-code level). It uses a trick I found on StackOverflow
(and which is credited in the package).

It is certainly not perfect, but it is "good enough" for the uses I had in
packages requiring some localtime information.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] problem running test on a system without /etc/localtime

2017-05-16 Thread Henrik Bengtsson
On Tue, May 16, 2017 at 5:35 PM, Kirill Maslinsky  wrote:
> Hi all,
>
> A problem with tests while building R.
>
> I'm packaging R for Sisyphus repository and package build environment,
> by design, doesn't have /etc/localtime file present. This causes failure
> with Sys.timeone during test run:
>
> [builder@localhost tests]$ ../bin/R --vanilla < reg-tests-1d.R
>
>> ## PR#17186 - Sys.timezone() on some Debian-derived platforms
>> (S.t <- Sys.timezone())
> Error in normalizePath("/etc/localtime") :
>   (converted from warning) path[1]="/etc/localtime": No such file or
>   directory
>   Calls: Sys.timezone -> normalizePath
>   Execution halted
>
> This is caused by this code:
>
>> Sys.timezone
> function (location = TRUE)
> {
> tz <- Sys.getenv("TZ", names = FALSE)
> if (!location || nzchar(tz))
> return(Sys.getenv("TZ", unset = NA_character_))
>>>  lt <- normalizePath("/etc/localtime")
> [remainder of the code skkipped]
>
> File /etc/loclatime is optional and is not guaranteed to be present on
> any platform. And anyway, it is a good idea to first check that file
> exists before calling normalizePath.

Looking at the code
(https://github.com/wch/r-source/blob/R-3-4-branch/src/library/base/R/datetime.R#L26),
could it be that mustWork = FALSE (instead of the default NA) avoids
the warning causes this check error?

Index: src/library/base/R/datetime.R
===
--- src/library/base/R/datetime.R (revision 72684)
+++ src/library/base/R/datetime.R (working copy)
@@ -23,7 +23,7 @@
 {
 tz <- Sys.getenv("TZ", names = FALSE)
 if(!location || nzchar(tz)) return(Sys.getenv("TZ", unset = NA_character_))
-lt <- normalizePath("/etc/localtime") # Linux, macOS, ...
+lt <- normalizePath("/etc/localtime", mustWork = FALSE) # Linux, macOS, ...
 if (grepl(pat <- "^/usr/share/zoneinfo/", lt)) sub(pat, "", lt)
 else if (lt == "/etc/localtime" && file.exists("/etc/timezone") &&
  dir.exists("/usr/share/zoneinfo") &&

/Henrik

>
> Sure, this can be worked around by setting TZ environment variable, but
> that causes tests to fail in another place:
>
> [builder@localhost tests]$ TZ="GMT" ../bin/R --vanilla < reg-tests-1d.R
>
>> ## format()ing invalid hand-constructed  POSIXlt  objects
>> d <- as.POSIXlt("2016-12-06"); d$zone <- 1
>> tools::assertError(format(d))
> Error: Failed to get error in evaluating format(d)
> Execution halted
>
> It seems that the best solution will be to patch Sys.timezone.
>
> --
> KM
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel