[Rd] Best style to organize code, namespaces

2010-02-22 Thread Ben
Hi all,

I'm hoping someone could tell me what best practices are as far as
keeping programs organized in R.  In most languages, I like to keep
things organized by writing small functions.  So, suppose I want to
write a function that would require helper functions or would just be
too big to write in one piece.  Below are three ways to do this:


### Style 1 (C-style) ###
Foo <- function(x) {
  
}
Foo.subf <- function(x, blah) {
  
}
Foo.subg <- function(x, bar) {
  
}

### Style 2 (Lispish?) ##
Foo <- function(x) {
  Subf <- function(blah) {

  }
  Subg <- function(bar) {

  }
  
}

### Object-Oriented #
Foo <- function(x) {
  Subf <- function(blah) {

  }
  Subg <- function(bar) {

  }
  Main <- function() {

  }
  return(list(subf=subf, subg=subg, foo=foo))
}
### End examples 

Which of these ways is best?  Style 2 seems at first to be the most
natural in R, but I found there are some major drawbacks.  First, it
is hard to debug.  For instance, if I want to debug Subf, I need to
first "debug(Foo)" and then while Foo is debugging, type
"debug(Subf)".  Another big limitation is that I can't write
test-cases (e.g. using RUnit) for Subf and Subg because they aren't
visible in any way at the global level.

For these reasons, style 1 seems to be better than style 2, if less
elegant.  However, style 1 can get awkward because any parameters
passed to the main function are not visible to the others.  In the
above case, the value of "x" must be passed to Foo.subf and Foo.subg
explicitly.  Also there is no enforcement of code isolation
(i.e. anyone can call Foo.subf).

Style 3 is more explicitly object oriented.  It has the advantage of
style 2 in that you don't need to pass x around, and the advantage of
style 1 in that you can still write tests and easily debug the
subfunctions.  However to actually call the main function you have to
type "Foo(x)$Main()" instead of "Foo(x)", or else write a wrapper
function for this.  Either way there is more typing.

So anyway, what is the best way to handle this?  R does not seem to
have a good way of managing namespaces or avoiding collisions, like a
module system or explicit object-orientation.  How should we get
around this limitation?  I've looked at sample R code in the
distribution and elsewhere, but so far it's been pretty
disappointing---most people seem to write very long, hard to
understand functions.

Thanks for any advice!

-- 
Ben

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Best style to organize code, namespaces

2010-02-23 Thread Ben
Thanks to everyone for the responses so far.  I didn't know about
setBreakpoint, mlocal, or proto before.

I think I'll try using proto a lot more and see if that fixes most of
my problems.  It appeals to me for two bad reasons: I had run across
proto when reading the code to ggplot2 earlier but didn't know the
purpose before (and I like ggplot2 and plyr, so Hadley's endorsement
carries some weight with me).  Also it uses an explicit passing of the
object to instances of that object---it's similar to Python's self,
and I'm a big fan of Python.  Anyway, so much for rationality.

I find it interesting that everyone seems to run into this issue, and
everyone solves it in their own way.  I hadn't known that Mark's
mvbutils and Duncan's setBreakpoint (how do I get this function?)
existed before.  I also like Mark Kimpel's trick, which appeals to a
simple-minded person like me.

Either R needs to expand it's standard library, or there should be a
standard guide on what you really need to work productively.  I'd have
a hard time in R without ggplot2, plyr, RUnit, perhaps proto now, and
a few other functions and modules, but it's taken a while to figure
out what I needed.  It could be in 6 months I'll discover something
else and realize that I've been wasting my time all along.

Anyway, thanks again for all the responses!


-- 
Ben

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proto and baseenv()

2010-02-24 Thread Ben
Wow, thanks for the heads-up.  That is horrible behavior.  But using
baseenv() doesn't seem like the solution either.  I'm new to proto,
but it seems like this is also a big drawback:

> z <- 1
> proto(baseenv(), expr={a=z})$a
Error in eval(expr, envir, enclos) : object "z" not found


-- 
Ben


- Original message -
From: Peter Danenberg 
To: r-devel@r-project.org
Date: Wed, 24 Feb 2010 22:38:54 -0600
I understand why the following happens ($.proto delegates to get,
which ascends the parent hierarchy up to globalenv()), but I still
find it anti-intuitive:

  > z <- 1
  > y <- proto(a=2)
  > y$z
  [1] 1

Although this is well-documented behavior; wouldn't it uphold the
principle of least surprise to inherit instead from baseenv() or
emptyenv()? (See attached patch.)

Spurious parent definitions have already been the source of bizarre
bugs, prompting me to use proto like this:

  > z <- 1
  > y <- proto(baseenv(), a=2)
  > y$z
  Error in get(x, env = this, inherits = inh) : object 'z' not found

It's cumbersome, but it ensures that parent definitions don't pollute
my object space; I would rather that be the default behaviour, though.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proto and baseenv()

2010-02-25 Thread Ben
I was disappointed in this behavior because it seems error-prone.
Suppose I declare an environment

> b <- 1
> p <- proto(expr={
a <- 2
+   ...
+ })
> p$a
[1] 2
> p$b
[1] 1

Presumably if I ask for p$a or p$b later, it's because I'm interesting
in the value of "p$a" or "p$b" that I specifically put inside that
environment.  Otherwise I would just ask for "a" or "b".  If I'm
asking for "p$b" it the above case, that means I forgot to declare b
inside p.  In this case there should be an error telling me that, not
a silent substitution of the wrong quantity.

If someone wanted to do the y$ls() thing, they could always

> y <- proto(a=1)
> with(y, ls())
[1] "a"

Another reason is that there are plenty of other programming languages
that have similar structures and this behavior is very odd.  In
standard languages asking for "b" inside the "p" object gives you an
error, and no one complains.  Even in R, we have this behavior:

> z <- 1
> list(a=3)$z
NULL

(Actually I think the above should be an error, but at least it isn't
1.)  So anyway, I'm not saying that p$b being 1 is an outright 2+2=5
bug, but it does seem to be surprising behavior that leads to bugs.
But I'm sure you're right that there are historical/structural reasons
for this to be the case, so perhaps there's no solution.


-- 
Ben Escoto

- Original message -
From: Thomas Petzoldt 
To: Ben 
Date: Thu, 25 Feb 2010 13:02:40 +0100
Am 25.02.2010 06:33, wrote Ben:
> Wow, thanks for the heads-up.  That is horrible behavior.  But using
> baseenv() doesn't seem like the solution either.  I'm new to proto,
> but it seems like this is also a big drawback:
>
>> z<- 1
>> proto(baseenv(), expr={a=z})$a
> Error in eval(expr, envir, enclos) : object "z" not found
>
>

I would say that this behaviour is intentional and not "horrible". proto 
objects do simply the same as ordinary functions in R that have also 
full access to variables and functions at a higher level:

Try the following:

 > y <- proto(a=2)
 > y$ls()
[1] "a"


ls() is defined in package base and so would even work if you inherit 
from baseenv() so why it is surprising that proto objects (by default) 
inherit objects from other packages and from the user workspace?


Thomas P.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proto and baseenv()

2010-02-25 Thread Ben
> I think you are looking for a different object model than proto
> offers.  There aren't many languages that offer the prototype object
> model.

Yes, your probably right---I don't have much experience using the
prototype model.  This is the way I expected it to work:

> z <- 1
> p <- proto(expr={a <- z})
> p$a
[1] 1
> p$z
Error in get(x, env = this, inherits = inh) : variable "z" was not found

Also it seems it would lead to fewer bugs if it worked that way.
(Peter Danenberg mentions he's run into bugs because of this, and I
can see why.)  But as I mentioned I'm new to prototype programming.
If it worked like in my snippet, would this lead to less effective
prototype programming?


Thanks,

-- 
Ben

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proto and baseenv()

2010-02-26 Thread Ben

> In end it seems that your real beef is with R so perhaps you should
> be using a different language.

In my case you may be right.  I do think there are a million things
wrong with R.  For instance, I was looking for a package that
overcomes two of the problems R IMHO has: namespace pollution and the
lack of an easy-to-use standard object system.

Should I be using R?  I do keep asking myself that same question...

> With respect to proto its really just discussing whether to use
> proto(baseenv(), ...) vs proto(...)

Unfortunately this doesn't fix the problem as was noted earlier:

> z <- 1
> proto(baseenv(), expr={a <- z})$a
Error in eval(expr, envir, enclos) : object "z" not found

> Also, your alternative likely would be unusable due to performance
> whereas proto is fast enough to be usable (see list of applications
> that use it at http://r-proto.googlecode.com/#Applications).  Its
> not as fast as S3 (though sometimes you can get it that fast by
> optimizing your code).  The development version of proto is even
> faster than the current version of proto due to the addition of lazy
> evaluation.

This make sense to me.


-- 
Ben Escoto

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proto and baseenv()

2010-02-26 Thread Ben
Maybe I'm still not getting something fundamental, but I didn't intend
my "proto(baseenv(), expr={a <- z})$a" example to be realistic.  In
practice "a <- z" would be replaced with hundreds of lines of code,
where many functions are called.  In theory you could track down every
function that's global or from another package and track them down,
but then you would have to put dozens of extra lines of boilerplate.
It's actually worse than that, as this example shows:

> proto(baseenv(), f = function(.) sd(1:3))$f()
Error in get("f", env = proto(baseenv(), f = function(.) sd(1:3)), inherits = 
TRUE)(proto(baseenv(),  : 
  could not find function "sd"
> proto(baseenv(), sd = sd, f = function(.) sd(1:3))$f()
Error in sd(1:3) : could not find function "var"

Not only would every external function have to be specifically
declared with a separate argument, even unused functions may need to
be declared.  That means any change in the implementation of an
external function could break this code.

Again, I may be missing something since I'm new to proto, but I don't
see why you're dismissing this example as "user error".


-- 
Ben Escoto

- Original message -
From: Gabor Grothendieck 
To: Ben 
Date: Fri, 26 Feb 2010 09:28:46 -0500
On Fri, Feb 26, 2010 at 9:01 AM, Ben  wrote:
>
>> In end it seems that your real beef is with R so perhaps you should
>> be using a different language.
>
> In my case you may be right.  I do think there are a million things
> wrong with R.  For instance, I was looking for a package that
> overcomes two of the problems R IMHO has: namespace pollution and the
> lack of an easy-to-use standard object system.
>
> Should I be using R?  I do keep asking myself that same question...
>
>> With respect to proto its really just discussing whether to use
>> proto(baseenv(), ...) vs proto(...)
>
> Unfortunately this doesn't fix the problem as was noted earlier:
>
>> z <- 1
>> proto(baseenv(), expr={a <- z})$a
> Error in eval(expr, envir, enclos) : object "z" not found
>

As already mentioned lets not confuse user error with actual problems
pertaining to proto and R.  It should have been written like this if
that is what was wanted:

> z <- 1
> proto(baseenv(), a = z)$a
[1] 1

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Associative array?

2010-03-11 Thread Ben
Hi, can someone tell me how to use associative arrays in R?  It can be
a hashtable or some kind of tree, as long as the lookups aren't O(n).
One way to do this is to use names, e.g. in:

list(a=3, ...)[["a"]]

presumably looking up "a" is very quick.  (Can someone tell me offhand
how that is implemented?  Hashtable?)  However, if I wanted to, say,
memoize a numeric function, I can't elegantly use R names because R names must 
be characters.

I found the hash package on CRAN:

http://cran.r-project.org/web/packages/hash/index.html

but it seems the keys are still characters.  Also, I haven't heard
anyone talking about it.  Trees and hashtables are common data
structures, so this problem must come up a lot.

Thanks,

-- 
Ben

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Associative array?

2010-03-11 Thread Ben
> lists are generic vectors with names so lookup is O(n). Environments  
> in R are true hash tables for that purpose:

Ahh, thanks for the information!  A function I wrote before indexing
on a data frame was slower than I expected, and now I know why.

> I don't quite understand - characters are (after raw vectors) the
> most expressive data type, so I'm not quite sure why that would be a
> limitation .. You can cast anything (but raw vector with nulls) into
> to a character.

It's no big problem, it's just that if the solution is to convert to
character type, then there are some implementation details to worry
about.  For instance, I assume that as.character(x) is a reversible
1-1 mapping if x is an integer (and not NA or NULL, etc).  But
apparently that isn't exactly true for floats, and it would get more
complicated for other data types.  So that's why I said it would not
be elegant, but that is a very subjective statement.

On a deeper level, it seems counterintuitive to me that indexing in R
is O(n).  Futhermore, associative arrays are a fundamental data type,
so I think it's weird that I can read the R tutorial, the R language
definition, and even the manual page for new.env() and still not have
enough information to build a decent one.  So IMHO things would be
better if R had a built-in easy-to-use general purpose associative
array.

> I don't see a problem thus I'm not surprised it didn't come up
> ;). But maybe I'm just missing your point ...

Nope, this has come up before---I think R and I are just on different
wavelengths.  Various things that I think are a problem with R are
apparently not, and it's fine the way it is.

Anyway, sorry for getting off topic ;-) You posted everything I need to know 
and I really appreciate your help.


-- 
Ben Escoto

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] dgamma error condition?

2005-11-04 Thread Ben Bolker

   There's an apparent inconsistency between the
behavior of d(pqr)gamma and other distribution
functions for "bad" parameter values.  Specifically,
most distributions give NaN and a warning for bad
parameters (e.g. probabilities <0 or >1).  In contrast,
d(pqr)gamma actually gives an error and stops when shape<0.
I don't see why it has to be this way -- the internal
C code is set up to detect shape<0 (or scale<0) and
return NaN and a warning, and none of the other
distribution functions in that bit of the code have
similar behavior.

   It would seem more consistent (and would be more
convenient for me -- the error-instead-of-warning
is making me have to jump through additional hoops)
if dgamma just returned NaN and a warning.

    Any thoughts?

   cheers
 Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Brainstorm: Alpha and Beta testing of R versions

2005-11-07 Thread Ben Bolker

   My most common problem with the bug reporting system
is distinguishing between bugs and my own stupidity
or confusion.  So I post to the r-devel list to ask;
even when there is a response, I may then
fail to get around to submitting the bug report itself ...
I know R-core doesn't want the bug list cluttered up with
non-bugs, but this two-step process often gets in the way of
my filing potential bugs.
   (I realize this is straying fairly far from the "alpha/beta
testing" topic to a more general discussion of bug reporting
etc. etc.)
   The main reason I fail to do alpha/beta checking is
that I like to keep using the same version of R as my classes
are currently using, and I haven't yet gone to the trouble of 
maintaining different versions on my system.

   [did anyone have any thoughts on my  4 Nov query about
errors vs warnings in dgamma?]


   cheers
 Ben Bolker

-- 
620B Bartram Hall[EMAIL PROTECTED]
Zoology Department, University of Floridahttp://www.zoo.ufl.edu/bolker
Box 118525   (ph)  352-392-5697
Gainesville, FL 32611-8525   (fax) 352-392-3704

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dgamma error condition?

2005-11-08 Thread Ben Bolker

   thanks.  would you like a patch?
(seems easy enough but I thought I'd offer)
looks like library/stats/R/distn.R and
nmath/rgamma.c need fixing; looks like
qgamma may not check for scale<0 in C
code either ...

   Ben Bolker

Prof Brian Ripley wrote:
> On Fri, 4 Nov 2005, Ben Bolker wrote:
> 
>>
>>   There's an apparent inconsistency between the
>> behavior of d(pqr)gamma and other distribution
>> functions for "bad" parameter values.  Specifically,
>> most distributions give NaN and a warning for bad
>> parameters (e.g. probabilities <0 or >1).  In contrast,
>> d(pqr)gamma actually gives an error and stops when shape<0.
>> I don't see why it has to be this way -- the internal
>> C code is set up to detect shape<0 (or scale<0) and
>> return NaN and a warning, and none of the other
>> distribution functions in that bit of the code have
>> similar behavior.
>>
>>   It would seem more consistent (and would be more
>> convenient for me -- the error-instead-of-warning
>> is making me have to jump through additional hoops)
>> if dgamma just returned NaN and a warning.
>>
>>Any thoughts?
> 
> 
> No one has come up with any, so let us remove the errors in R-devel.
> 
> Note that rgamma is not protected at C level: try rgamma(10, -2), or 
> (worse) rgamma(10, -20)  after removing the stop() call.
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] minor suggestion for optim documentation

2005-11-18 Thread Ben Bolker

   consider adding the following clause to the optim documentation
describing SANN?  I had to go look this up in the
optim.c code for a student who is not proficient
in C, and it may be simple enough to include it
in the documentation.  (This replaces the existing
sentence that ends where the semicolon is.)

Temperatures are decreased according to the
   logarithmic cooling schedule as given in Belisle (1992, p. 890);
   specifically, the temperature is set to
\code{temp/(log((t-1) %/%tmax)*tmax+exp(1))},
   where \code{t} is the current iteration step.

   sincerely
 Ben Bolker

-- 
620B Bartram Hall[EMAIL PROTECTED]
Zoology Department, University of Floridahttp://www.zoo.ufl.edu/bolker
Box 118525   (ph)  352-392-5697
Gainesville, FL 32611-8525   (fax) 352-392-3704

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] confint/nls

2006-01-07 Thread Ben Bolker

   I have found some "issues" (bugs?) with nls confidence intervals ...
some with the relatively new "port" algorithm, others more general
(but possibly in the "well, don't do that" category).  I have
corresponded some with Prof. Ripley about them, but I thought I
would just report how far I've gotten in case anyone else has
thoughts.  (I'm finding the code in stats/nls.R and stats/nle-profile.R
quite dense & scary ...)
   All of this has been done with R-devel from 3 Jan 2006; the changes
that Prof. Ripley already made to allow confint.nls not to crash
when algorithm="port" are in R-devel, not R-patched.

   a synopsis of the problems with confint():

with a 1-parameter model (is confint not appropriate for 1-parameter
models? it doesn't say so in the docs [by the way, "normality" is
misspelled as "nornality" in ?confint]):

algorithm=default or plinear: get a complaint from qr.qty ('qr' and 
'y' must have the same number of rows)
port: "cannot allocate vector of size [large]" [caused by C code
looking for dims when they aren't there]

   2-parameter models:
default OK
port "cannot allocate vector"
plinear "Error in xy.coords"

3-parameter models are OK

   I can fix the 2-parameter port case by adding drop=FALSE in
appropriate places, but I wanted to check in just in case
there are better/more efficient ways than my slogging through
one case at a time ...

   apologies for the long message, but I am temporarily cut
off from any way to post these files to the web.

   cheers
 Ben Bolker

code that tests various combinations of numbers of parameters
and algorithms:
---
resmat = array(dim=c(3,2,3),
dimnames=list(npar=1:3,c("fit","confint"),c("default","plinear","port")))
resmat.fix <- resmat
## sim. values
npts=1000
set.seed(1001)
x = runif(npts)
b = 0.7
y = x^b+rnorm(npts,sd=0.05)
a =0.5
y2 = a*x^b+rnorm(npts,sd=0.05)
c = 1.0
y3 = a*(x+c)^b+rnorm(npts,sd=0.05)
d = 0.5
y4 = a*(x^d+c)^b+rnorm(npts,sd=0.05)

testfit <- function(model,start,alg) {
   tryfit <- try(fit <- 
nls(model,start=start,algorithm=alg,control=list(maxiter=200)))
   if (class(tryfit)!="try-error") {
 fitcode="OK"
 tryci <- try(confint(fit))
 if (class(tryci)!="try-error") {
   cicode="OK"
 } else cicode = as.character(tryci)
   } else {
 fitcode = as.character(tryfit)
 cicode="?"
   }
   c(fitcode,cicode)
}

m1 = c(y~x^b,y2~a*x^b,y3~a*(x+exp(logc))^b)
m2 = c(y2~x^b,y3~(x+exp(logc))^b,y4~(x^d+exp(logc))^b)
s1 = list(c(b=1),c(a=1,b=1),c(a=1,b=1,logc=0))
s2 = list(c(b=1),c(b=1,logc=0),c(b=1,logc=0,d=0.5))

for (p in 1:3) {
   resmat[p,,"default"] <- testfit(m1[[p]],start=s1[[p]],alg=NULL)
   resmat[p,,"port"] <- testfit(m1[[p]],start=s1[[p]],alg="port")
}

for (p in 1:3) {
   resmat[p,,"plinear"] <- testfit(m2[[p]],start=s2[[p]],alg="plinear")
}

print(resmat)
set.seed(1002)
example(nls,local=TRUE)

diffs:
--
*** /usr/local/src/R/R-devel/src/library/stats/R/nls.R  2006-01-07 
10:57:08.0 -0500
--- nlsnew.R2006-01-07 19:18:53.0 -0500
***
*** 266,277 
   gradSetArgs[[1]] <- (~attr(ans, "gradient"))[[2]]
   gradCall <-
   switch(length(gradSetArgs) - 1,
!call("[", gradSetArgs[[1]], gradSetArgs[[2]]),
!call("[", gradSetArgs[[1]], gradSetArgs[[2]], 
gradSetArgs[[2]]),
  call("[", gradSetArgs[[1]], gradSetArgs[[2]], 
gradSetArgs[[2]],
! gradSetArgs[[3]]),
  call("[", gradSetArgs[[1]], gradSetArgs[[2]], 
gradSetArgs[[2]],
! gradSetArgs[[3]], gradSetArgs[[4]]))
   getRHS.varying <- function()
   {
   ans <- getRHS.noVarying()
--- 266,277 
   gradSetArgs[[1]] <- (~attr(ans, "gradient"))[[2]]
   gradCall <-
   switch(length(gradSetArgs) - 1,
!call("[", gradSetArgs[[1]], gradSetArgs[[2]],drop=FALSE),
!call("[", gradSetArgs[[1]], gradSetArgs[[2]], 
gradSetArgs[[2]],drop=FALSE),
  call("[", gradSetArgs[[1]], gradSetArgs[[2]], 
gradSetArgs[[2]],
! gradSetArgs[[3]],drop=FALSE),
  call("[", gradSetArgs[[1]], gradSetArgs[[2]], 
gradSetArgs[[2]],
! gradSetArgs[[3]], gradSetArgs[[4]],drop=FALSE))
   getRHS.varying <- function()
   {
   ans <- getRHS.noVarying()
***
*** 331,337 
   else {
   vary
   }, envir = thisEnv)
!  gradCall[[length(gradCall)]] <<- usePar

[Rd] prod(numeric(0)) surprise

2006-01-08 Thread Ben Bolker

   It surprised me that prod(numeric(0)) is 1.
I guess if you say (operation(nothing) == identity
element) this makes sense, but ??

Looking in the code, this makes sense:
basically (s=1; for i=0 to length(x),
multiply s by x[i]) -- which comes out to 1.

   What *should* prod(numeric(0)) produce?
I couldn't find the answer documented anywhere.

   (And how about sum(numeric(0))==0,
which for some reason makes more intuitive sense
to me, but is really exactly the same thing --
consider exp(sum(log(numeric(0 ... ?)

   cheers
     Ben Bolker

-- 
620B Bartram Hall[EMAIL PROTECTED]
Zoology Department, University of Floridahttp://www.zoo.ufl.edu/bolker
Box 118525   (ph)  352-392-5697
Gainesville, FL 32611-8525   (fax) 352-392-3704

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] prod(numeric(0)) surprise

2006-01-08 Thread Ben Bolker
Duncan Murdoch wrote:
> On 1/8/2006 9:24 PM, Ben Bolker wrote:
> 
>>It surprised me that prod(numeric(0)) is 1.
>> I guess if you say (operation(nothing) == identity
>> element) this makes sense, but ??
> 
> 
> What value were you expecting, or were you expecting an error?  I can't 
> think how any other value could be justified, and throwing an error 
> would make a lot of formulas more complicated.
> 
>>
> 
> 
> That's a fairly standard mathematical convention, which is presumably 
> why sum and prod work that way.
> 
> Duncan Murdoch

   OK.  I guess I was expecting NaN/NA (as opposed to an error),
but I take the "this makes everything else more complicated" point.
Should this be documented or is it just too obvious ... ?
(Funny -- I'm willing to take gamma(1)==1 without any argument
or suggestion that it should be documented ...)


   cheers
 Ben


-- 
620B Bartram Hall[EMAIL PROTECTED]
Zoology Department, University of Floridahttp://www.zoo.ufl.edu/bolker
Box 118525   (ph)  352-392-5697
Gainesville, FL 32611-8525   (fax) 352-392-3704

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wikis (was about prod(numeric(0)))

2006-01-09 Thread Ben Bolker

Tony Plate  acm.org> writes:

 >
 > Since the virtue and reliability of Wikis was brought up, I created a R
 > Wiki page for this at
 > 
http://www.sciviews.org/_rgui/wiki/doku.php?id=beginners:surprises:emptysetfuncs
 >
 >
 > Anyone: please correct errors and improve it!
 >
 > Tony Plate
 >

   OK, now I have another question:
I see a wiki at
http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?RwikiHome
administered by Detlef Steuer
   I see another at http://www.sciviews.org/_rgui/wiki/doku.php
(Phillippe Grosjean)
which is nominally geared toward beginners/GUI interfaces.

  I'm not saying I could do any better, but both of these look
as though they'd be pretty hard to get into if you were really
a beginner or intermediate R user looking for info ... Paul
John's Rtips (http://www.ku.edu/~pauljohn/R/Rtips.html) is actually
about the best example about there -- the flat hierarchy might not
work too well for a really big wiki, but having at least
a good first-level hierarchy set up (and making sure that
what new users see is not a lot of detail about how to extend
the wiki) seems really important.

   Sorry to criticize, but I am happy to start working
on populating a wiki -- but only if the structure is there
so that I can figure out where to put stuff and hope that
someone who needs it will ever be able to find it ...

-- 
620B Bartram Hall[EMAIL PROTECTED]
Zoology Department, University of Floridahttp://www.zoo.ufl.edu/bolker
Box 118525   (ph)  352-392-5697
Gainesville, FL 32611-8525   (fax) 352-392-3704

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] nls profile with port/constraints

2006-01-17 Thread Ben Bolker

   Sorry to report further difficulties with
nls and profiling and constraints ... the problem
this time (which I didn't check for in my last
round of testing) is that the nls profiler doesn't
seem to respect constraints that have been
set when using the port algorithm.
   See test code below ...
   If I can I will try to hack the code, but I will
probably start by redefining my function with
some workarounds to make the fit quadratically "bad" (but well-defined)
when the parameters are negative ...
As always, please don't hesitate to correct me
if I'm being an idiot ...

   cheers
 Ben Bolker

---
rm(list=ls())

npts=10
set.seed(1001)

a =2
b =0.5
x= runif(npts)
y = a*x/(1+a*b*x)+rnorm(npts,sd=0.2)

gfun <- function(a,b,x) {
   if (a<0 || b<0) stop("bounds violated")
   a*x/(1+a*b*x)
}

m1 = nls(y~gfun(a,b,x),algorithm="port",
   lower=c(0,0),start=c(a=1,b=1))

try(confint(m1))



for what it's worth, the logic appears to be OK in mle in the stats4
library:
--
library(stats4)

mfun <- function(a,b,s) {
   if (a<0 || b<0 || s<0) stop("bounds violated")
   -sum(dnorm(y,a*x/(1+a*b*x),sd=s,log=TRUE))
}

m2 = mle(minuslogl=mfun,
   start=list(a=1,b=1,s=0.1),
   method="L-BFGS-B",lower=c(0.002,0.002,0.002))

confint(m2)


m2b = mle(minuslogl=mfun,
   fixed=list(b=0),start=list(a=1,s=0.1),
   method="L-BFGS-B",lower=c(0.002,0.002,0.002))
## set boundary slightly above zero to avoid
## boundary cases

dev <- 2*(-logLik(m2b)+logLik(m2))
as.numeric(pchisq(dev,lower.tail=FALSE,df=1))


-- 
620B Bartram Hall[EMAIL PROTECTED]
Zoology Department, University of Floridahttp://www.zoo.ufl.edu/bolker
Box 118525   (ph)  352-392-5697
Gainesville, FL 32611-8525   (fax) 352-392-3704

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Display an Image on a Plane

2006-01-19 Thread Ben Bolker
Labbe, Vincent (AEREX  drdc-rddc.gc.ca> writes:

> 
> Hi,
> 
> I am new to R and I would like to display an image on a plane in a 3D plot,
> i.e. I would like to be able to specify a theta and a phi parameters like in
> the function persp to display a 2D image on an inclined plane.
> 
> Regards,
> 
> vincent
> 

   can't think of an easy way to do this: what do you mean by "image"
exactly?  A bitmapped image from a file?  Or something like the
output of image()?  If the latter, you may be able to cobble together something
using the trans3d() function (i.e., manually recreating an image()
by drawing colored squares, but transforming each of the to the 3D
perspective).  If the former, you might be able to do something with
the pixmap package ...

  Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] nls profiling with algorithm="port" may viol ate bounds (PR#8508)

2006-01-21 Thread Ben Bolker
Spencer Graves  pdf.com> writes:

> 
> Hi, Ben, et al.:
> 
>     The issue Ben identified with confint(nls(... )) generates a hard 
> failure for me.  

  "We" (being Brian Ripley and I) know about this already.
  I'm sorry I failed to specify enough info in my bug report,
but I was using R-devel/2.3.0 of (I think?) 11 January, under Linux.
Your problem is actually PR #8428, which is fixed enough to prevent
a crash in 2.2.1 patched and really fixed in 2.3.0, all thanks to
Brian Ripley.

  cheers
Ben

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] too-large notches in boxplot (PR #7690)

2006-01-23 Thread Ben Bolker

   PR #7690 points out that if the confidence intervals (+/-1.58 
IQR/sqrt(n)) in a boxplot with notch=TRUE are larger than the
hinges -- which is most likely to happen for small n and asymmetric
distributions -- the resulting plot is ugly, e.g.:

set.seed(1001)
npts <- 5
X <- rnorm(2*npts,rep(3:4,each=npts),sd=1)
f <- factor(rep(1:2,each=npts))
boxplot(X~f)
boxplot(X~f,notch=TRUE)

   I can imagine debate about what should be done in this case --
you could just say "don't do that", since the notches are based
on an asymptotic argument ... the diff below just truncates
the notches to the hinges, but produces a warning saying that the 
notches have been truncated.

   ?? what should the behavior be ??

  the diff is against the 11 Jan version of R 2.3.0



*** newboxplot.R2006-01-23 14:32:12.0 -0500
--- oldboxplot.R2006-01-23 14:29:29.0 -0500
***
*** 84,98 
   bplt <- function(x, wid, stats, out, conf, notch, xlog, i)
   {
 ## Draw single box plot
-   conf.ok <- TRUE
-   if(!any(is.na(stats))) {
-   ## check for overlap of notches and hinges
-   if (notch && (stats[2]>conf[1] || stats[4] hinges: 
notches truncated")
   axes <- is.null(pars$axes)
   if(!axes) { axes <- pars$axes; pars$axes <- NULL }
   if(axes) {
--- 231,243 
   xysegments <- segments
   }

   for(i in 1:n)
!   bplt(at[i], wid=width[i],
  stats= z$stats[,i],
  out  = z$out[z$group==i],
  conf = z$conf[,i],
  notch= notch, xlog = xlog, i = i)
!
   axes <- is.null(pars$axes)
   if(!axes) { axes <- pars$axes; pars$axes <- NULL }
   if(axes) {

-- 
620B Bartram Hall[EMAIL PROTECTED]
Zoology Department, University of Floridahttp://www.zoo.ufl.edu/bolker
Box 118525   (ph)  352-392-5697
Gainesville, FL 32611-8525   (fax) 352-392-3704

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R on the brain

2006-01-30 Thread Ben Bolker

   I was sitting in the coffee room at work listening to people complain
about a recent seminar about nanotechnology using the terms 
nanofluidics, nanofactory, nano-this, and nano-that ... I found myself 
thinking "well the speaker should just
have said
   with(nano,
  ...)

   Un(?)fortunately there's no-one here I can share that thought with.

-- 
620B Bartram Hall[EMAIL PROTECTED]
Zoology Department, University of Floridahttp://www.zoo.ufl.edu/bolker
Box 118525   (ph)  352-392-5697
Gainesville, FL 32611-8525   (fax) 352-392-3704

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] silent recycling in logical indexing

2018-01-04 Thread Ben Bolker

  Sorry if this has been covered here somewhere in the past, but ...

  Does anyone know why logical vectors are *silently* recycled, even
when they are incommensurate lengths, when doing logical indexing?  This
is as documented:

  For ‘[’-indexing only: ‘i’, ‘j’, ‘...’ can be logical
  vectors, indicating elements/slices to select.  Such vectors
  are recycled if necessary to match the corresponding extent.

but IMO weird:

> x <- c(TRUE,TRUE,FALSE)
> y <- c(TRUE,FALSE)
> x[y]
[1]  TRUE FALSE

## (TRUE, FALSE) gets recycled to (TRUE,FALSE,TRUE) and selects
##  the first and third elements

If we do logical operations instead we do get a warning:

> x | y
[1] TRUE TRUE TRUE
Warning message:
In x | y : longer object length is not a multiple of shorter object length

  Is it just too expensive to test for incomplete recycling when doing
subsetting, or is there a sensible use case for incomplete recycling?

  Ll. 546ff of main/src/subscript.c suggest that there is a place in the
code where we already know if incomplete recycling has happened ...

 Thoughts?

   cheers
 Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] silent recycling in logical indexing

2018-01-04 Thread Ben Bolker
Hmm.

Chuck: I don't see how this example represents
incomplete/incommensurate recycling. Doesn't TRUE replicate from
length-1 to length-3 in this case (mat[c(TRUE,FALSE),2] would be an
example of incomplete recycling)?

William: clever, but maybe too clever unless you really need the
speed? (The clever way is 8 times faster in the following case ...)

x <- rep(1,1e6)
rbenchmark::benchmark(x[c(FALSE,TRUE,FALSE)],x[seq_along(x) %% 3 == 2])

On the other hand, it takes 0.025 vs 0.003 seconds per iteration ...
fortunes::fortune("7ms")


On Thu, Jan 4, 2018 at 4:09 PM, Berry, Charles  wrote:
>
>
>> On Jan 4, 2018, at 11:56 AM, Ben Bolker  wrote:
>>
>>
>>  Sorry if this has been covered here somewhere in the past, but ...
>>
>>  Does anyone know why logical vectors are *silently* recycled, even
>> when they are incommensurate lengths, when doing logical indexing?
>
> It is convenient to use a single `TRUE' in programmatic manipulation of 
> subscripts in the same manner as using an empty subscript interactively:
>
>> mat<-diag(1:3)
>> expr1 <- quote(mat[])
>> expr1[[3]] <- TRUE
>> expr1[[4]] <- 2
>> eval(expr1)
> [1] 0 2 0
>> mat[,2]
> [1] 0 2 0
>
> HTH,
>
> Chuck

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] silent recycling in logical indexing

2018-01-04 Thread Ben Bolker
PS I'm tempted to insert a warning at this point and see how often it
actually gets triggered ...

On Thu, Jan 4, 2018 at 4:44 PM, Ben Bolker  wrote:
> Hmm.
>
> Chuck: I don't see how this example represents
> incomplete/incommensurate recycling. Doesn't TRUE replicate from
> length-1 to length-3 in this case (mat[c(TRUE,FALSE),2] would be an
> example of incomplete recycling)?
>
> William: clever, but maybe too clever unless you really need the
> speed? (The clever way is 8 times faster in the following case ...)
>
> x <- rep(1,1e6)
> rbenchmark::benchmark(x[c(FALSE,TRUE,FALSE)],x[seq_along(x) %% 3 == 2])
>
> On the other hand, it takes 0.025 vs 0.003 seconds per iteration ...
> fortunes::fortune("7ms")
>
>
> On Thu, Jan 4, 2018 at 4:09 PM, Berry, Charles  wrote:
>>
>>
>>> On Jan 4, 2018, at 11:56 AM, Ben Bolker  wrote:
>>>
>>>
>>>  Sorry if this has been covered here somewhere in the past, but ...
>>>
>>>  Does anyone know why logical vectors are *silently* recycled, even
>>> when they are incommensurate lengths, when doing logical indexing?
>>
>> It is convenient to use a single `TRUE' in programmatic manipulation of 
>> subscripts in the same manner as using an empty subscript interactively:
>>
>>> mat<-diag(1:3)
>>> expr1 <- quote(mat[])
>>> expr1[[3]] <- TRUE
>>> expr1[[4]] <- 2
>>> eval(expr1)
>> [1] 0 2 0
>>> mat[,2]
>> [1] 0 2 0
>>
>> HTH,
>>
>> Chuck

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] fix potential integer overflow in mantelhaen.test ?

2018-01-24 Thread Ben Bolker
  Someone who was running into an integer overflow warning (and
downstream error) with mantelhaen.test() on a large data set asked about
it on StackOverflow:

https://stackoverflow.com/questions/48422398/na-nan-inf-in-foreign-function-error-with-mantelhaen-test/48428596#48428596

  Coercing ntot to double (i.e. ntot -> as.numeric(ntot)) on lines 283
and 285 here:

https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/library/stats/R/mantelhaen.test.R#L283

  seems to fix the problem, fairly harmlessly (*maybe* there are some
edge cases where the integer and floating-point computations give
different answers??? these values are going to get sent to qr.solve() a
few lines later in any case ...)

   If people think this is worthwhile I could submit a bug report.

 cheers
   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] bugzilla issues

2018-01-25 Thread Ben Bolker

 tl;dr is the R bug tracker down or am I being an idiot? Help please ...


 I decided I would follow up on

https://stat.ethz.ch/pipermail/r-devel/2018-January/075410.html

(reporting/suggesting a patch for an issue in stats::mantelhaen.test()
with large data sets)

  Reading the instructions at https://www.r-project.org/bugs.html
suggests that if I'm sure I have found something worthy of posting to
the R bug tracker, I should go to https://bugs.r-project.org/bugzilla3

This results in


Software error:

Can't locate Email/Sender/Simple.pm in @INC (you may need to install the
Email::Sender::Simple module) (@INC contains: .
lib/x86_64-linux-gnu-thread-multi lib /etc/perl
/usr/local/lib/x86_64-linux-gnu/perl/5.24.1 /usr/local/share/perl/5.24.1
/usr/lib/x86_64-linux-gnu/perl5/5.24 /usr/share/perl5
/usr/lib/x86_64-linux-gnu/perl/5.24 /usr/share/perl/5.24
/usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at
Bugzilla/Mailer.pm line 26.
BEGIN failed--compilation aborted at Bugzilla/Mailer.pm line 26.
Compilation failed in require at Bugzilla/Auth.pm line 22.
BEGIN failed--compilation aborted at Bugzilla/Auth.pm line 22.
Compilation failed in require at Bugzilla.pm line 23.
BEGIN failed--compilation aborted at Bugzilla.pm line 23.
Compilation failed in require at /var/lib/bugzilla4/index.cgi line 15.
BEGIN failed--compilation aborted at /var/lib/bugzilla4/index.cgi line 15.

For help, please send mail to the webmaster (ad...@rforge.net), giving
this error message and the time and date of the error.
===

e-mailing ad...@rforge.net bounces with

This is the mail system at host hagal.urbanek.info.

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to postmaster.

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

   The mail system

: User unknown in virtual alias table
=

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sparse.model.matrix Generates Non-Existent Factor Levels if Ord.factor Columns Present

2018-02-08 Thread Ben Bolker

  color and clarity are ordered factors, so sparse.model.matrix is
generating orthogonal-polynomial contrasts  (see ?contr.poly).  This is
by design ...  what are you trying to do?  Are you interested in fac2sparse?

On 18-02-07 11:00 PM, Dario Strbenac wrote:
> Good day,
> 
> Sometimes, sparse.model.matrix outputs a dgCMatrix which has column names 
> consisting of factor levels that were not in the original dataset. The first 
> factor appears to be correctly transformed, but the following factors don't. 
> For example:
> 
> diamonds <- as.data.frame(ggplot2::diamonds)
>> colnames(sparse.model.matrix(~ . -1, diamonds))
>  [1] "carat""cutFair"  "cutGood"  "cutVery Good" "cutPremium" 
>   "cutIdeal" "color.L"  "color.Q"  "color.C"  "color^4"  
> "color^5" 
> [12] "color^6"  "clarity.L""clarity.Q""clarity.C""clarity^4"  
>   "clarity^5""clarity^6""clarity^7""depth""table"
> "price"   
> [23] "x""y""z"
> 
> The variables color and clarity don't have factor levels which have been 
> suffixed to them in the transformed matrix. The values in those columns are 
> also wrong. Changing the Ord.factor columns into simply being factors fixes 
> the problem. 
> 
>> diamonds[, "cut"] <- factor(as.character(diamonds[, "cut"]))
>> diamonds[, "color"] <- factor(as.character(diamonds[, "color"]))
>> diamonds[, "clarity"] <- factor(as.character(diamonds[, "clarity"]))
> 
>> colnames(sparse.model.matrix(~ . -1, diamonds)) # No more invented factor 
>> levels.
>  [1] "carat""cutFair"  "cutGood"  "cutIdeal" "cutPremium" 
>   "cutVery Good" "colorE"   "colorF"   "colorG"   "colorH"  
> [11] "colorI"   "colorJ"   "clarityIF""claritySI1"   "claritySI2" 
>   "clarityVS1"   "clarityVS2"   "clarityVVS1"  "clarityVVS2"  "depth"   
> [21] "table""price""x""y""z"
> 
> Can it be made to work correctly for both plain and ordered factors?
> 
>> sessionInfo()
> R Under development (unstable) (2018-02-06 r74231)
> Platform: i386-w64-mingw32/i386 (32-bit)
> 
> other attached packages:
> [1] Matrix_1.2-12
> 
> loaded via a namespace (and not attached):
>  [1] colorspace_1.3-2 scales_0.5.0 compiler_3.5.0   lazyeval_0.2.1  
>  [5] plyr_1.8.4   pillar_1.1.0 gtable_0.2.0 tibble_1.4.2
>  [9] Rcpp_0.12.15 ggplot2_2.2.1grid_3.5.0   rlang_0.1.6 
> [13] munsell_0.4.3lattice_0.20-35
> 
> --
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] backquotes and term.labels

2018-03-07 Thread Ben Bolker
I knew I had seen this before but couldn't previously remember where.
https://github.com/lme4/lme4/issues/441 ... I initially fixed with
gsub(), but (pushed by Martin Maechler to do better) I eventually
fixed it by storing the original names of the model frame (without
backticks) as an attribute for later retrieval:
https://github.com/lme4/lme4/commit/56416fc8b3b5153df7df5547082835c5d5725e89.


On Wed, Mar 7, 2018 at 8:22 AM, Therneau, Terry M., Ph.D. via R-devel
 wrote:
> Thanks to Bill Dunlap for the clarification.  On follow-up it turns out that
> this will be an issue for many if not most of the routines in the survival
> package: a lot of them look at the terms structure and make use of the
> dimnames of attr(terms, 'factors'), which also keeps the unneeded
> backquotes.  Others use the term.labels attribute.  To dodge this I will
> need to create a fixterms() routine which I call at the top of every single
> routine in the library.
>
> Is there a chance for a fix at a higher level?
>
> Terry T.
>
>
>
> On 03/05/2018 03:55 PM, William Dunlap wrote:
>>
>> I believe this has to do terms() making "term.labels" (hence the dimnames
>> of "factors")
>> with deparse(), so that the backquotes are included for non-syntactic
>> names.  The backquotes
>> are not in the column names of the input data.frame (nor model frame) so
>> you get a mismatch
>> when subscripting the data.frame or model.frame with elements of
>> terms()$term.labels.
>>
>> I think you can avoid the problem by adding right after
>>  ll <- attr(Terms, "term.labels")
>> the line
>>  ll <- gsub("^`|`$", "", ll)
>>
>> E.g.,
>>
>>  > d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x y
>> z`=cos(1:5)+2)
>>  > Terms <- terms( y ~ log(`b$a$d`) + `x y z` )
>>  > m <- model.frame(Terms, data=d)
>>  > colnames(m)
>> [1] "y""log(`b$a$d`)" "x y z"
>>  > attr(Terms, "term.labels")
>> [1] "log(`b$a$d`)" "`x y z`"
>>  >   ll <- attr(Terms, "term.labels")
>>  > gsub("^`|`$", "", ll)
>> [1] "log(`b$a$d`)" "x y z"
>>
>> It is a bit of a mess.
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com 
>>
>> On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel
>> mailto:r-devel@r-project.org>> wrote:
>>
>> A user reported a problem with the survdiff function and the use of
>> variables that
>> contain a space.  Here is a simple example.  The same issue occurs in
>> survfit for the
>> same reason.
>>
>> lung2 <- lung
>> names(lung2)[1] <- "in st"   # old name is inst
>> survdiff(Surv(time, status) ~ `in st`, data=lung2)
>> Error in `[.data.frame`(m, ll) : undefined columns selected
>>
>> In the body of the code the program want to send all of the right-hand
>> side variables
>> forward to the strata() function.  The code looks more or less like
>> this, where m is
>> the model frame
>>
>>Terms <- terms(m)
>>index <- attr(Terms, "term.labels")
>>if (length(index) ==0)  X <- rep(1L, n)  # no coariates
>>else X <- strata(m[index])
>>
>> For the variable with a space in the name the term.label is "`in st`",
>> and the
>> subscript fails.
>>
>> Is this intended behaviour or a bug?  The issue is that the name of
>> this column in the
>> model frame does not have the backtics, while the terms structure does
>> have them.
>>
>> Terry T.
>>
>> __
>> R-devel@r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>>
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fwd: Re: [EXTERNAL] Re: backquotes and term.labels

2018-03-08 Thread Ben Bolker
Meant to respond to this but forgot.

 I didn't write a new terms() function  -- I added an attribute to the
terms() (a vector of the names
of the constructed model matrix), thus preserving the information at
the point when it was available.
  I do agree that it would be preferable to have an upstream fix ...


On Thu, Mar 8, 2018 at 9:39 AM, Therneau, Terry M., Ph.D. via R-devel
 wrote:
> Ben,
>
>
> Looking at your notes, it appears that your solution is to write your own
> terms() function
> for lme.  It is easy to verify that the "varnames.fixed" attribute is not
> returned by the
> ususal terms function.
>
> Then I also need to write my own terms function for the survival and coxme
> pacakges?
> Because of the need to treat strata() terms in a special way I manipulate
> the
> formula/terms in nearly every routine.
>
> Extrapolating: every R package that tries to examine formulas and partition
> them into bits
> needs its own terms function?  This does not look like a good solution to
> me.
>
> On 03/07/2018 07:39 AM, Ben Bolker wrote:
>>
>> I knew I had seen this before but couldn't previously remember where.
>> https://github.com/lme4/lme4/issues/441 ... I initially fixed with
>> gsub(), but (pushed by Martin Maechler to do better) I eventually
>> fixed it by storing the original names of the model frame (without
>> backticks) as an attribute for later retrieval:
>>
>> https://github.com/lme4/lme4/commit/56416fc8b3b5153df7df5547082835c5d5725e89.
>>
>>
>> On Wed, Mar 7, 2018 at 8:22 AM, Therneau, Terry M., Ph.D. via R-devel
>>  wrote:
>>>
>>> Thanks to Bill Dunlap for the clarification.  On follow-up it turns out
>>> that
>>> this will be an issue for many if not most of the routines in the
>>> survival
>>> package: a lot of them look at the terms structure and make use of the
>>> dimnames of attr(terms, 'factors'), which also keeps the unneeded
>>> backquotes.  Others use the term.labels attribute.  To dodge this I will
>>> need to create a fixterms() routine which I call at the top of every
>>> single
>>> routine in the library.
>>>
>>> Is there a chance for a fix at a higher level?
>>>
>>> Terry T.
>>>
>>>
>>>
>>> On 03/05/2018 03:55 PM, William Dunlap wrote:
>>>>
>>>> I believe this has to do terms() making "term.labels" (hence the
>>>> dimnames
>>>> of "factors")
>>>> with deparse(), so that the backquotes are included for non-syntactic
>>>> names.  The backquotes
>>>> are not in the column names of the input data.frame (nor model frame) so
>>>> you get a mismatch
>>>> when subscripting the data.frame or model.frame with elements of
>>>> terms()$term.labels.
>>>>
>>>> I think you can avoid the problem by adding right after
>>>>   ll <- attr(Terms, "term.labels")
>>>> the line
>>>>   ll <- gsub("^`|`$", "", ll)
>>>>
>>>> E.g.,
>>>>
>>>>   > d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x
>>>> y
>>>> z`=cos(1:5)+2)
>>>>   > Terms <- terms( y ~ log(`b$a$d`) + `x y z` )
>>>>   > m <- model.frame(Terms, data=d)
>>>>   > colnames(m)
>>>> [1] "y""log(`b$a$d`)" "x y z"
>>>>   > attr(Terms, "term.labels")
>>>> [1] "log(`b$a$d`)" "`x y z`"
>>>>   >   ll <- attr(Terms, "term.labels")
>>>>   > gsub("^`|`$", "", ll)
>>>> [1] "log(`b$a$d`)" "x y z"
>>>>
>>>> It is a bit of a mess.
>>>>
>>>>
>>>> Bill Dunlap
>>>> TIBCO Software
>>>> wdunlap tibco.com <http://tibco.com>
>>>>
>>>> On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel
>>>> mailto:r-devel@r-project.org>> wrote:
>>>>
>>>>  A user reported a problem with the survdiff function and the use of
>>>> variables that
>>>>  contain a space.  Here is a simple example.  The same issue occurs
>>>> in
>>>> survfit for the
>>>>  same reason.
>>>>
>>>>  lung2 <- lung
>>>>  names(lung2)[1] <- "in st"   # old name is inst
>>>&

Re: [Rd] Fwd: Re: [EXTERNAL] Re: backquotes and term.labels

2018-03-08 Thread Ben Bolker


On 18-03-08 10:07 AM, Martin Maechler wrote:
>>>>>> Ben Bolker 
>>>>>> on Thu, 8 Mar 2018 09:42:40 -0500 writes:
> 
> > Meant to respond to this but forgot.
> > I didn't write a new terms() function  -- I added an attribute to the
> > terms() (a vector of the names
> > of the constructed model matrix), thus preserving the information at
> > the point when it was available.
> > I do agree that it would be preferable to have an upstream fix ...
> 
> did anybody ever propose a small patch to the upstream sources ?
> 
> -- including a REPREX (or 2: one for lme4, one for survival) 
> 
> I'm open to look at one .. not for the next few days, though.
> 
> Martin

  Didn't get around to it ...  a bit worried about doing it 100%
back-compatibly, also a bit scared in general of messing with such deep
stuff. It could probably done *non*-backward-compatibly simply by
changing line 459 of library/stats/R/models.R to

varnames <- vapply(vars, function(x) deparse2(x,backtick=FALSE),
" ")[-1L]

... ?


> 
> 
> > On Thu, Mar 8, 2018 at 9:39 AM, Therneau, Terry M., Ph.D. via R-devel
> >  wrote:
> >> Ben,
> >> 
> >> 
> >> Looking at your notes, it appears that your solution is to write your 
> own
> >> terms() function
> >> for lme.  It is easy to verify that the "varnames.fixed" attribute is 
> not
> >> returned by the
> >> ususal terms function.
> >> 
> >> Then I also need to write my own terms function for the survival and 
> coxme
> >> pacakges?
> >> Because of the need to treat strata() terms in a special way I 
> manipulate
> >> the
> >> formula/terms in nearly every routine.
> >> 
> >> Extrapolating: every R package that tries to examine formulas and 
> partition
> >> them into bits
> >> needs its own terms function?  This does not look like a good solution 
> to
> >> me.
> >> 
> >> On 03/07/2018 07:39 AM, Ben Bolker wrote:
> >>> 
> >>> I knew I had seen this before but couldn't previously remember where.
> >>> https://github.com/lme4/lme4/issues/441 ... I initially fixed with
> >>> gsub(), but (pushed by Martin Maechler to do better) I eventually
> >>> fixed it by storing the original names of the model frame (without
> >>> backticks) as an attribute for later retrieval:
> >>> 
> >>> 
> https://github.com/lme4/lme4/commit/56416fc8b3b5153df7df5547082835c5d5725e89.
> >>> 
> >>> 
> >>> On Wed, Mar 7, 2018 at 8:22 AM, Therneau, Terry M., Ph.D. via R-devel
> >>>  wrote:
> >>>> 
> >>>> Thanks to Bill Dunlap for the clarification.  On follow-up it turns 
> out
> >>>> that
> >>>> this will be an issue for many if not most of the routines in the
> >>>> survival
> >>>> package: a lot of them look at the terms structure and make use of 
> the
> >>>> dimnames of attr(terms, 'factors'), which also keeps the unneeded
> >>>> backquotes.  Others use the term.labels attribute.  To dodge this I 
> will
> >>>> need to create a fixterms() routine which I call at the top of every
> >>>> single
> >>>> routine in the library.
> >>>> 
> >>>> Is there a chance for a fix at a higher level?
> >>>> 
> >>>> Terry T.
> >>>> 
> >>>> 
> >>>> 
> >>>> On 03/05/2018 03:55 PM, William Dunlap wrote:
> >>>>> 
> >>>>> I believe this has to do terms() making "term.labels" (hence the
> >>>>> dimnames
> >>>>> of "factors")
> >>>>> with deparse(), so that the backquotes are included for 
> non-syntactic
> >>>>> names.  The backquotes
> >>>>> are not in the column names of the input data.frame (nor model 
> frame) so
> >>>>> you get a mismatch
> >>>>> when subscripting the data.frame or model.frame with elements of
> >>>>> terms()$term.labels.
> >>>>> 
> >>>>> I think you can avoid the problem by adding right after

Re: [Rd] [EXTERNAL] Re: Fwd: Re: Re: backquotes and term.labels

2018-03-08 Thread Ben Bolker

  That's more or less right.  We wrote a terms.merMod method, which
accesses the terms component of the @frame slot, which we have modified
upstream ...

  Do you mean term.labels rather than term.names?

BTW ?terms.object says (under the "term.labels" element):

 Non-syntactic names will be quoted by backticks: this makes
  it easier to re-construct the formula from the term labels.

  This suggests, alas, that this was an intentional design decision --
so harder to change.

  cheers
   Ben


On 18-03-08 11:24 AM, Therneau, Terry M., Ph.D. wrote:
> Ben,
> I looked at the source code you pointed out, and the line
>     fr <- fr[attr(terms(fr),"varnames.fixed")]
> 
> sure looks to me as though the terms() function has returned an object
> with a varnames.fixed attribute.  Either that or your code has inside
> knowledge that a reader like me won't know.  Given what you said I will
> guess that terms(x) returns the terms attribute of x if that already
> exists, doing no processing, and you know that fr has already been
> thusly modified?
> 
> I have code that uses the variables, term.names, and factors attributes
> of the terms() structure, and all of those have the backticks.  I know
> the second two will currently break the coxme and survival packages, I
> haven't chased down the effect on the first.
> 
> 
> On 03/08/2018 08:42 AM, Ben Bolker wrote:
>> Meant to respond to this but forgot.
>>
>>   I didn't write a new terms() function  -- I added an attribute to the
>> terms() (a vector of the names
>> of the constructed model matrix), thus preserving the information at
>> the point when it was available.
>>    I do agree that it would be preferable to have an upstream fix ...
>>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trivial typo in man/pretty.Rd

2018-03-19 Thread Ben Bolker
  patch against recent SVN ...
  as far as I can tell this trivial typo has been there for 20 years:

https://github.com/wch/r-source/blame/ba7920a99fb2fb62b89e404e65f8b132ed4c150a/src/library/base/man/pretty.Rd

===
--- pretty.Rd   (revision 74426)
+++ pretty.Rd   (working copy)
@@ -21,8 +21,8 @@
   \item{min.n}{nonnegative integer giving the \emph{minimal} number of
 intervals.  If \code{min.n == 0}, \code{pretty(.)} may return a
 single value.}
-  \item{shrink.sml}{positive numeric
-by a which a default scale is shrunk in the case when
+  \item{shrink.sml}{positive numeric factor
+by which a default scale is shrunk in the case when
 \code{range(x)} is very small (usually 0).}
   \item{high.u.bias}{non-negative numeric, typically \eqn{> 1}.
 The interval unit is determined as \{1,2,5,10\} times \code{b}, a

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] suggested patch for messages

2018-04-08 Thread Ben Bolker
Does anyone have comments on the new wording here?

Suggested:

The Title field should be in title case. Current version is:
   (xxx)
In title case this would be:
   (Xxx)

Hoping R core will see this here and magically adopt it, otherwise
I'll try posting it to the r bugs site ...

===
--- src/library/tools/R/QC.R(revision 74551)
+++ src/library/tools/R/QC.R(working copy)
@@ -7727,8 +7727,10 @@
 "The Title field starts with the package name."
 },
 if(length(y <- x$title_case)) {
-paste(c("The Title field should be in title case,
current version then in title case:",
-sQuote(y)),
+paste(c("The Title field should be in title case.
Current version is:",
+sQuote(y[1]),
+"In title case this would be:",
+sQuote(y[2])),
   collapse = "\n")
 })),
   fmt(c(if(length(x$descr_bad_initial)) {

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Fwd: suggested patch for messages

2018-04-13 Thread Ben Bolker
Any follow-up/comments on this? If I don't hear back I'll submit to
r-bugs so it doesn't get lost ...

(Don't see any changes in QC.R in the last few days ...
https://github.com/wch/r-source/commits/trunk/src/library/tools/R/QC.R
)

-- Forwarded message --
From: Ben Bolker 
Date: Sun, Apr 8, 2018 at 1:45 PM
Subject: suggested patch for messages
To: r-devel@r-project.org


Does anyone have comments on the new wording here?

Suggested:

The Title field should be in title case. Current version is:
   (xxx)
In title case this would be:
   (Xxx)

Hoping R core will see this here and magically adopt it, otherwise
I'll try posting it to the r bugs site ...

===
--- src/library/tools/R/QC.R(revision 74551)
+++ src/library/tools/R/QC.R(working copy)
@@ -7727,8 +7727,10 @@
 "The Title field starts with the package name."
 },
 if(length(y <- x$title_case)) {
-paste(c("The Title field should be in title case,
current version then in title case:",
-sQuote(y)),
+paste(c("The Title field should be in title case.
Current version is:",
+sQuote(y[1]),
+"In title case this would be:",
+sQuote(y[2])),
   collapse = "\n")
 })),
   fmt(c(if(length(x$descr_bad_initial)) {

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] readLines() behaves differently for gzfile connection

2018-05-10 Thread Ben Heavner
When I read a .gz file with readLines() in 3.4.3, it returns text (and a
warning). In 3.5.0, it gives a warning, but no text. Is this expected
behavior or a bug?

3.4.3:
> source_file = "1k_annotation.gz"
> readfile_con <- gzfile(source_file, "r")
> readLines(readfile_con, n = 5)
[1] "#chr\tpos\tref\talt\t



Warning message:
In readLines(readfile_con, n = 5) :
  seek on a gzfile connection returned an internal error

> close(readfile_con)

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.3

-

3.5.0:
> source_file = "1k_annotation.gz"
> readfile_con <- gzfile(source_file, "r")
> readLines(readfile_con, n = 5)
[1] "" "" "" "" ""
Warning message:
In readLines(readfile_con, n = 5) :
  seek on a gzfile connection returned an internal error
> close(readfile_con)
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.0

----
(note: I'm running 3.5.0 via the docker rocker/tidyverse:3.5 container, and
3.4.3 on my mac desktop machine)

Thanks!
Ben Heavner

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] readLines() behaves differently for gzfile connection

2018-05-10 Thread Ben Heavner
You bet - it's available on github at
https://github.com/UW-GAC/wgsaparsr/blob/master/tests/testthat/1k_annotation.gz

-Ben

On Thu, May 10, 2018 at 4:17 PM, Michael Lawrence  wrote:

> Would it be possible to get that file or a representative subset of it
> somewhere so that I can reproduce this?
>
> Thanks,
> Michael
>
> On Thu, May 10, 2018 at 3:31 PM, Ben Heavner  wrote:
> > When I read a .gz file with readLines() in 3.4.3, it returns text (and a
> > warning). In 3.5.0, it gives a warning, but no text. Is this expected
> > behavior or a bug?
> >
> > 3.4.3:
> >> source_file = "1k_annotation.gz"
> >> readfile_con <- gzfile(source_file, "r")
> >> readLines(readfile_con, n = 5)
> > [1] "#chr\tpos\tref\talt\t
> >
> > 
> >
> > Warning message:
> > In readLines(readfile_con, n = 5) :
> >   seek on a gzfile connection returned an internal error
> >
> >> close(readfile_con)
> >
> >> sessionInfo()
> > R version 3.4.3 (2017-11-30)
> > Platform: x86_64-apple-darwin15.6.0 (64-bit)
> > Running under: macOS Sierra 10.12.6
> >
> > Matrix products: default
> > BLAS:
> > /Library/Frameworks/R.framework/Versions/3.4/
> Resources/lib/libRblas.0.dylib
> > LAPACK:
> > /Library/Frameworks/R.framework/Versions/3.4/
> Resources/lib/libRlapack.dylib
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_3.4.3
> >
> > -
> >
> > 3.5.0:
> >> source_file = "1k_annotation.gz"
> >> readfile_con <- gzfile(source_file, "r")
> >> readLines(readfile_con, n = 5)
> > [1] "" "" "" "" ""
> > Warning message:
> > In readLines(readfile_con, n = 5) :
> >   seek on a gzfile connection returned an internal error
> >> close(readfile_con)
> >> sessionInfo()
> > R version 3.5.0 (2018-04-23)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Debian GNU/Linux 9 (stretch)
> >
> > Matrix products: default
> > BLAS: /usr/lib/openblas-base/libblas.so.3
> > LAPACK: /usr/lib/libopenblasp-r0.2.19.so
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
> >  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_3.5.0
> >
> > 
> > (note: I'm running 3.5.0 via the docker rocker/tidyverse:3.5 container,
> and
> > 3.4.3 on my mac desktop machine)
> >
> > Thanks!
> > Ben Heavner
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] aic() component in GLM-family objects

2018-06-03 Thread Ben Bolker


  Is it generally known/has it been previously discussed here that the
$aic() component in GLM-family objects (e.g. results of binomial(),
poisson(), etc.) does not as implemented actually return the AIC, but
rather -2*log-likelihood + 2*(model_has_scale_parameter) ?  Can anyone
in this forum gauge how a documentation patch would be received?

This behaviour does not seem to be documented in ?family (or anywhere
else I can find), which says:

  aic: function giving the AIC value if appropriate (but ‘NA’ for
  the quasi- families).  See ‘logLik’ for the assumptions made
  about the dispersion parameter.

For a demonstration that e.g. binomial()$aic() is really -2*log L and
not the AIC, see:

https://github.com/wch/r-source/blob/trunk/src/library/stats/R/family.R#L317

This document
  <https://github.com/lme4/lme4/blob/master/misc/notes/deviance.rmd>
explicates the details a bit more ('L' denotes log-likelihood):

   * family()$aic computes $-2L$, which glm.fit translates to an AIC by
adding $2k$ and storing it in model$aic
   * logLik.default retrieves model$aic and converts it back to a
log-likelihood
   * stats:::AIC.default retrieves the log-likelihood and converts it
back to an AIC (!)
   * family()$dev.resid() computes the squared deviance residuals
   * stats:::residuals.glm retrieves these values and takes the signed
square root

  cheers
Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] aic() component in GLM-family objects

2018-06-17 Thread Ben Bolker
FWIW p. 206 of the White Book gives the following for
names(binomial()): family, names, link, inverse, deriv, initialize,
variance, deviance, weight.

  So $aic wasn't there In The Beginning.  I haven't done any more
archaeology to try to figure out when/by whom it was first introduced
...

 Section 6.3.3, on extending families, doesn't give any other relevant info.

A patch for src/library/stats/man/family.Rd below: please check what
I've said about $aic and $mu.eta to make sure they're correct!  I can
submit this to the r bug list if preferred.


Index: family.Rd
===
--- family.Rd(revision 74904)
+++ family.Rd(working copy)
@@ -31,7 +31,7 @@
 \arguments{
   \item{link}{a specification for the model link function.  This can be
 a name/expression, a literal character string, a length-one character
-vector or an object of class
+vector, or an object of class
 \code{"\link[=make.link]{link-glm}"} (such as generated by
 \code{\link{make.link}}) provided it is not specified
 \emph{via} one of the standard names given next.
@@ -45,7 +45,7 @@
 the \code{Gamma} family the links \code{inverse}, \code{identity}
  and \code{log};
 the \code{poisson} family the links \code{log}, \code{identity},
-and \code{sqrt} and the \code{inverse.gaussian} family the links
+and \code{sqrt}; and the \code{inverse.gaussian} family the links
 \code{1/mu^2}, \code{inverse}, \code{identity}
 and \code{log}.

@@ -105,8 +105,8 @@
 \note{
   The \code{link} and \code{variance} arguments have rather awkward
   semantics for back-compatibility.  The recommended way is to supply
-  them is as quoted character strings, but they can also be supplied
-  unquoted (as names or expressions).  In addition, they can also be
+  them as quoted character strings, but they can also be supplied
+  unquoted (as names or expressions).  Additionally, they can be
   supplied as a length-one character vector giving the name of one of
   the options, or as a list (for \code{link}, of class
   \code{"link-glm"}).  The restrictions apply only to links given as
@@ -130,10 +130,18 @@
   \item{dev.resids}{function giving the deviance residuals as a function
 of \code{(y, mu, wt)}.}
   \item{aic}{function giving the AIC value if appropriate (but \code{NA}
-for the quasi- families).  See \code{\link{logLik}} for the assumptions
-  made about the dispersion parameter.}
-  \item{mu.eta}{function: derivative \code{function(eta)}
-\eqn{d\mu/d\eta}.}
+for the quasi- families).  More precisely, this function
+returns \eqn{-2 L + 2 s}, where \eqn{L} is the log-likelihood and \eqn{s}
+is the number of estimated scale parameters; the penalty term for the
+location parameters is added elsewhere.
+See \code{\link{logLik}} for the assumptions
+made about the dispersion parameter.}
+  \item{mu.eta}{function: derivative of the inverse-link function
+with respect to the linear predictor. If the inverse-link
+function is \eqn{\mu=g^{-1}(\eta)}{mu=ginv(eta)}
+where \eqn{eta}{\eta} is the value
+of the linear predictor, then this function returns
+\eqn{d(g^{-1})/d\eta=d\mu/d\eta}{d(ginv(eta))/d(eta)=d(mu)/d(eta)}.}
   \item{initialize}{expression.  This needs to set up whatever data
 objects are needed for the family as well as \code{n} (needed for
 AIC in the binomial family) and \code{mustart} (see \code{\link{glm}}).}
@@ -224,8 +232,8 @@
 ## which case use an offset of 0 in the corresponding formula
 ## to get the null deviance right.

-## Binomial with identity link: often not a good idea.
-\dontrun{binomial(link = make.link("identity"))}
+## Binomial with identity link: often computationally and
conceptually difficult
+\dontrun{binomial(link = "identity")}

 ## tests of quasi
 x <- rnorm(100)
@@ -236,7 +244,7 @@
 glm(y ~ x, family = quasi(variance = "mu^2", link = "log"))
 \dontrun{glm(y ~ x, family = quasi(variance = "mu^3", link = "log")) # fails}
 y <- rbinom(100, 1, plogis(x))
-# needs to set a starting value for the next fit
+# need to set a starting value for the next fit
 glm(y ~ x, family = quasi(variance = "mu(1-mu)", link = "logit"),
start = c(0,1))
 }
 \keyword{models}

On Mon, Jun 4, 2018 at 10:46 AM, Martin Maechler
 wrote:
>>>>>> Ben Bolker
>>>>>> on Sun, 3 Jun 2018 17:33:18 -0400 writes:
>
> > Is it generally known/has it been previously discussed here that the
> > $aic() component in GLM-family objects (e.g. results of binomial(),
> > poisson(), etc.) does not as implemented actually return the AIC, but
> > rather -2*log-likelihood + 2*(model_has_scale_parameter) ?
>
> This rings a faint bell from the last millennium with me,
> and the follow

[Rd] image() method for Matrix fails on empty matrices (?)

2018-08-18 Thread Ben Bolker


 Reasonably easy to avoid, but maybe an edge case that should be
handled?  Haven't looked yet to see how easy it would be to fix ... Am I
missing something?

> library(Matrix)
> m <- Matrix(0,nrow=3,ncol=3)
> m
3 x 3 sparse Matrix of class "dsCMatrix"

[1,] . . .
[2,] . . .
[3,] . . .
> image(m)
Error in seq.default(zrng[1], zrng[2], length.out = cuts + 2) :
  'from' must be a finite number
In addition: Warning messages:
1: In min(xx, na.rm = TRUE) :
  no non-missing arguments to min; returning Inf
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
4: In min(x) : no non-missing arguments to min; returning Inf
5: In max(x) : no non-missing arguments to max; returning -Inf

## now modify matrix
> m[3,3] <- 1
> image(m) ## works fine
## reset:
> m[3,3] <- 0
> image(m) ## fails again

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] proposed patch to /src/library/datasets/man/mtcars.Rd

2018-08-26 Thread Ben Bolker
Mara Averick noticed some oddities in the mtcars data set:

https://twitter.com/dataandme/status/1033341784959709184


I propose the following patch. While anyone with access to JSTOR could
dig in and find this information themselves, it would seem a friendly
gesture to include it ...

  cheers
Ben Bolker

===
--- mtcars.Rd   (revision 75186)
+++ mtcars.Rd   (working copy)
@@ -35,6 +35,14 @@
   Building multiple regression models interactively.
   \emph{Biometrics}, \bold{37}, 391--411.
 }
+\details{
+Henderson and Velleman (1981) comment in a footnote to Table 1:
+\sQuote{Hocking [original transcriber]'s  noncrucial coding of the
+Mazda's rotary engine as a straight six-cylinder engine and the
+Porsche's flat engine as a V engine, as well as the inclusion of the diesel
+Mercedes 240D, have been retained to enable direct comparisons to be made
+with previous analyses.}
+}
 \examples{
 require(graphics)
 pairs(mtcars, main = "mtcars data", gap = 1/4)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Ben Bolker



On 2018-09-19 09:40 AM, David Hugh-Jones wrote:
> On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch 
> wrote:
> 
>>
>> I think the analyses are correct, but I doubt if a change to the default
>> is likely to be accepted as it would make it more difficult to reproduce
>> older results.
> 
> 
> I'm a bit alarmed by the logic here. Unbiased sampling seems basic for a
> statistical language. As a consumer of R I'd like to think that e.g. my
> bootstrapped p values are correct.
> Surely if the old results depend on the biased algorithm, then they are
> false results?
> 

   Balancing backward compatibility and correctness is a tough problem
here.  If this goes into base R, what's the best way to do it?  What was
the protocol for migrating away from the "buggy Kinderman-Ramage"
generator, back in the day?   (Version 1.7 was sometime between 2001 and
2004).

  I couldn't find the exact commit in the GitHub mirror: this is related ...

https://github.com/wch/r-source/commit/7ad3044639fd1fe093c655e573fd1a67aa7f55f6#diff-dbcad570d4fb9b7005550ff630543b37



===
‘normal.kind’ can be ‘"Kinderman-Ramage"’, ‘"Buggy
 Kinderman-Ramage"’ (not for ‘set.seed’), ‘"Ahrens-Dieter"’,
 ‘"Box-Muller"’, ‘"Inversion"’ (the default), or ‘"user-supplied"’.
 (For inversion, see the reference in ‘qnorm’.)  The
 Kinderman-Ramage generator used in versions prior to 1.7.0 (now
 called ‘"Buggy"’) had several approximation errors and should only
 be used for reproduction of old results.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Ben Bolker


  A quick point of order here: arguing with Duncan in this forum is
helpful to expose ideas, but probably neither side will convince the
other; eventually, if you want this adopted in core R, you'll need to
convince an R-core member to pursue this fix.

  In the meantime, a good, well-tested implementation in a
user-contributed package (presumably written in C for speed) would be
enormously useful.  Volunteers ... ?



On 2018-09-19 04:19 PM, Duncan Murdoch wrote:
> On 19/09/2018 3:52 PM, Philip B. Stark wrote:
>> Hi Duncan--
>>
>> Nice simulation!
>>
>> The absolute difference in probabilities is small, but the maximum
>> relative difference grows from something negligible to almost 2 as m
>> approaches 2**31.
>>
>> Because the L_1 distance between the uniform distribution on {1, ...,
>> m} and what you actually get is large, there have to be test functions
>> whose expectations are quite different under the two distributions. 
> 
> That is a mathematically true statement, but I suspect it is not very
> relevant.  Pseudo-random number generators always have test functions
> whose sample averages are quite different from the expectation under the
> true distribution.  Remember Von Neumann's "state of sin" quote.  The
> bug in sample() just means it is easier to find such a function than it
> would otherwise be.
> 
> The practical question is whether such a function is likely to arise in
> practice or not.
> 
>> Whether those correspond to commonly used statistics or not, I have no
>> idea.
> 
> I am pretty confident that this bug rarely matters.
> 
>> Regarding backwards compatibility: as a user, I'd rather the default
>> sample() do the best possible thing, and take an extra step to use
>> something like sample(..., legacy=TRUE) if I want to reproduce old
>> results.
> 
> I suspect there's a good chance the bug I discovered today (non-integer
> x values not being truncated) will be declared to be a feature, and the
> documentation will be changed.  Then the rejection sampling approach
> would need to be quite a bit more complicated.
> 
> I think a documentation warning about the accuracy of sampling
> probabilities would also be a sufficient fix here, and would be quite a
> bit less trouble than changing the default sample().  But as I said in
> my original post, a contribution of a function without this bug would be
> a nice addition.
> 
> Duncan Murdoch
> 
>>
>> Regards,
>> Philip
>>
>> On Wed, Sep 19, 2018 at 9:50 AM Duncan Murdoch
>> mailto:murdoch.dun...@gmail.com>> wrote:
>>
>>     On 19/09/2018 12:23 PM, Philip B. Stark wrote:
>>  > No, the 2nd call only happens when m > 2**31. Here's the code:
>>
>>     Yes, you're right. Sorry!
>>
>>     So the ratio really does come close to 2.  However, the difference in
>>     probabilities between outcomes is still at most 2^-32 when m is less
>>     than that cutoff.  That's not feasible to detect; the only detectable
>>     difference would happen if some event was constructed to hold an
>>     abundance of outcomes with especially low (or especially high)
>>     probability.
>>
>>     As I said in my original post, it's probably not hard to construct
>> such
>>     a thing, but as I've said more recently, it probably wouldn't
>> happen by
>>     chance.  Here's one attempt to do it:
>>
>>     Call the values from unif_rand() "the unif_rand() outcomes".  Call
>> the
>>     values from sample() the sample outcomes.
>>
>>     It would be easiest to see the error if half of the sample() outcomes
>>     used two unif_rand() outcomes, and half used just one.  That would
>> mean
>>     m should be (2/3) * 2^32, but that's too big and would trigger the
>>     other
>>     version.
>>
>>     So how about half use 2 unif_rands(), and half use 3?  That means m =
>>     (2/5) * 2^32 = 1717986918.  A good guess is that sample() outcomes
>>     would
>>     alternate between the two possibilities, so our event could be even
>>     versus odd outcomes.
>>
>>     Let's try it:
>>
>>   > m <- (2/5)*2^32
>>   > m > 2^31
>>     [1] FALSE
>>   > x <- sample(m, 100, replace = TRUE)
>>   > table(x %% 2)
>>
>>        0      1
>>     399850 600150
>>
>>     Since m is an even number, the true proportions of evens and odds
>>     should
>>     be exactly 0.5.  That's some pretty strong evidence of the bug in the
>>     generator.  (Note that the ratio of the observed probabilities is
>> about
>>     1.5, so I may not be the first person to have done this.)
>>
>>     I'm still not convinced that there has ever been a simulation run
>> with
>>     detectable bias compared to Monte Carlo error unless it (like this
>> one)
>>     was designed specifically to show the problem.
>>
>>     Duncan Murdoch
>>
>>  >
>>  > (RNG.c, lines 793ff)
>>  >
>>  > double R_unif_index(double dn)
>>  > {
>>  >      double cut = INT_MAX;
>>  >
>>  >      switch(RNG_kind) {
>>  >      case KNUTH_TAOCP:
>>  >      case USER_UNIF:
>> 

[Rd] NEWS typos, download.file.Rd tweaks

2018-10-09 Thread Ben Bolker

  Some small changes (typo, punctuation fix) to the NEWS file, and some
suggested changes to download.file.Rd (I found some of the phrasing
awkward/hard to parse, and got carried away).  (I *think* .txt
attachments are OK on the list?)

  FWIW I started fixing the download.file man page because I went there
to understand/be horrified by the following (documented!) infelicity:
with the default settings (on Unix at least), failing to download what
is intended to be a new version of a file will wipe out the original,
leaving an empty file.  So, for example, a temporary network problem can
lead to a pipeline being unworkable (because a critical file has been
emptied) until the network comes back ...  I don't suppose there's any
chance this behaviour could be changed ... ??

> What happens to the destination file(s) in the case of error
 depends on the method and R version. Currently the ‘"internal"’,
 ‘"wininet"’ and ‘"libcurl"’ methods will remove the file if there
 the URL is unavailable except when ‘mode’ specifies appending when
 the file should be unchanged.

Obligatory xkcd: https://xkcd.com/293/

 cheers
   Ben Bolker
Index: doc/NEWS.Rd
===
--- doc/NEWS.Rd (revision 75424)
+++ doc/NEWS.Rd (working copy)
@@ -141,7 +141,7 @@
   i.e., all help pages. show platform-independent information (rather
   than Windows or Unix-alike specifics visible only on that platform).
   Consequently, the Windows version of \code{X11()} / \code{x11()}
-  got identical formal arguments the unix one.
+  now has identical formal arguments to the Unix version.
 
   \item \code{sessionInfo()$running} has been factored out in a new
   variable \code{osVersion}. % precomputed at utils namespace load time
@@ -283,7 +283,7 @@
 
   \item The documentation for \code{identify()} incorrectly claimed
   that the indices of identified points were returned in the order
-  that the points were selected..  \code{identify()} now has a new
+  that the points were selected.  \code{identify()} now has a new
   argument \code{order} to allow the return value to include the
   order in which points were identified; the documentation has been
   updated.  Reported by Richard Rowe and Samuel Granjeaud.
Index: src/library/utils/man/download.file.Rd
===
--- src/library/utils/man/download.file.Rd  (revision 75424)
+++ src/library/utils/man/download.file.Rd  (working copy)
@@ -133,15 +133,16 @@
   \code{"libcurl"} method values of the option less than 2 give verbose
   output.
 
-  A progress bar tracks the transfer platform specifically:
+  A progress bar tracks the transfer, with behavior depending on the
+  platform:
   \describe{
 \item{On Windows}{If the file length is known, the
   full width of the bar is the known length.  Otherwise the initial
   width represents 100 Kbytes and is doubled whenever the current width
-  is exceeded.  (In non-interactive use this uses a text version.  If the
-  file length is known, an equals sign represents 2\% of the transfer
-  completed: otherwise a dot represents 10Kb.)}
-\item{On a unix-alike}{If the file length is known, an
+  is exceeded.  (In non-interactive use the progress bar uses
+  a text version which follows the Unix-alike behavior described
+  below.)}
+\item{On a Unix-alike}{If the file length is known, an
   equals sign represents 2\% of the transfer completed: otherwise a dot
   represents 10Kb.}
   }
@@ -248,7 +249,7 @@
   Note that the root certificates used by \R may or may not be the same
   as used in a browser, and indeed different browsers may use different
   certificate bundles (there is typically a build option to choose
-  either their own or the system ones).
+  either their own or those from the system).
 }
 \section{FTP sites}{
   \samp{ftp:} URLs are accessed using the FTP protocol which has a
@@ -284,8 +285,8 @@
   What happens to the destination file(s) in the case of error depends
   on the method and \R{} version. Currently the \code{"internal"},
   \code{"wininet"} and \code{"libcurl"} methods will remove the file if
-  there the URL is unavailable except when \code{mode} specifies
-  appending when the file should be unchanged.
+  the URL is unavailable except when \code{mode} specifies
+  appending, in which case the file will not be changed.
 }
 \seealso{
   \code{\link{options}} to set the \code{HTTPUserAgent}, \code{timeout}
Index: src/main/eval.c
===
--- src/main/eval.c (revision 75424)
+++ src/main/eval.c (working copy)
@@ -1,4 +1,4 @@
-/*
+   /*
  *  R : A Computer Language for Statistical Data Analysis
  *  Copyright (C) 1998--2018

[Rd] broken link to Titanic data

2018-10-16 Thread Ben Bolker


This URL, which appears in the ?Titanic web page:

 https://www.amstat.org/publications/jse/v3n3/datasets.dawson.html

gives me a 404 error.  Googling "Dawson titanic data" gives

ww2.amstat.org/publications/jse/v3n3/datasets.dawson.html

which is similarly broken.

  Poking around gives this preferred reference URL to the original
article on Taylor & Francis's site:

https://doi.org/10.1080/10691898.1995.11910499

This article is free to access, but it doesn't appear that it's possible
to access the original "titanic.dat.txt" through T&F's web site.  It is
on the internet archive ...

https://web.archive.org/web/20180323070125/https://ww2.amstat.org/publications/jse/datasets/titanic.dat.txt

The codebook is at

https://web.archive.org/web/20180405163149/http://ww2.amstat.org:80/publications/jse/datasets/titanic.txt

  I can submit this as a bug report if that's recommended ...

  Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trivial typo in src/library/stats/man/ts.Rd

2018-11-04 Thread Ben Bolker
 "vector[s]" should be plural in line 54 ...

  cheers
   Ben Bolker



Index: ts.Rd
===
--- ts.Rd   (revision 75540)
+++ ts.Rd   (working copy)
@@ -54,7 +54,7 @@
 }
 \details{
   The function \code{ts} is used to create time-series objects.  These
-  are vector or matrices with class of \code{"ts"} (and additional
+  are vectors or matrices with class of \code{"ts"} (and additional
   attributes) which represent data which has been sampled at equispaced
   points in time.  In the matrix case, each column of the matrix
   \code{data} is assumed to contain a single (univariate) time series.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion for `glm.fit`

2018-11-26 Thread Ben Bolker


 I don't know whether this helps or not, but using residuals(fit) rather
than fit$residuals returns 0 for the last value.  This is different from
(predict(fit,type="response")[25] - Y[25]) (or the equivalent Pearson
residual) because the *weighted* residuals are returned by definition
(not that I see this being explained super-clearly in the documentation) ...

FWIW, ?residuals says:

 The abbreviated form ‘resid’ is an alias for ‘residuals’.  It is
 intended to encourage users to access object components through an
 accessor function rather than by directly referencing an object
 slot.

 - in other words, if you use fit$residuals you're expected to know
exactly what you're doing and how things might get weird ...



On 2018-11-26 5:08 p.m., Benjamin Christoffersen wrote:
> Dear sirs,
> 
> One gets unexpected `residuals` if one is not aware of the meaning of
> weights when a weight is set to zero and the outcome is one in the
> `binomial` family in a call to `glm.fit`. The reason is the following
> line from `binomial()$initialize`
>> y[weights == 0] <- 0
> 
> Here is an example:
> pval <- seq(.05, .95, length.out = 25)
> X <- log(pval / (1 - pval)) - 2
> Y <- c(
>   FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE,
>   FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE,
>   TRUE, TRUE, TRUE, FALSE, TRUE, TRUE)
> 
> W <- rep(1, length(Y))
> W[length(W)] <- 0
> fit <- glm(Y ~ X, binomial(), weights = W)
> fit$residuals[25]
> #R25
> #R -45.77847
> 
> # Maybe it should be the following. Otherwise maybe there should be a
> # warning in `binomial()$initialize` when `y`s are set to zero?
> with(
>   fit, tail((Y - fitted.values) / binomial()$mu.eta(linear.predictors), 1))
> #R   25
> #R 1.022332
> 
> sessionInfo()
> #R R version 3.5.1 (2018-07-02)
> #R Platform: x86_64-w64-mingw32/x64 (64-bit)
> #R Running under: Windows >= 8 x64 (build 9200)
> #R
> #R Matrix products: default
> #R
> #R locale:
> #R [1] LC_COLLATE=English_United States.1252
> #R [2] LC_CTYPE=English_United States.1252
> #R [3] LC_MONETARY=English_United States.1252
> #R [4] LC_NUMERIC=C
> #R [5] LC_TIME=English_United States.1252
> #R
> #R attached base packages:
> #R [1] stats graphics  grDevices utils datasets  methods
> #R [7] base
> #R
> #R loaded via a namespace (and not attached):
> #R [1] compiler_3.5.1 tools_3.5.1yaml_2.1.18
> 
> Sincerely yours,
> Benjamin Christoffersen
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] issue with testInstalledPackage

2018-11-26 Thread Ben Bolker


  FWIW I've been reasonably happy with the revdepcheck package: it's not
base-R, but it's pretty robust (lme4 'only' has 286 dependencies to
check ...)  I've had much better luck running it on a remote server
(where the sysadmin is responsive so it's not too much trouble to get
extra system dependencies/Debian packages installed as they become
necessary).

On 2018-11-26 3:42 p.m., Therneau, Terry M., Ph.D. via R-devel wrote:
> Background: I run tools::testInstalledPackage on all packages that dependend 
> on survival 
> (605 as of today) before sending a new release to CRAN. It has a few false 
> positives which 
> I then follow up on.  (Mostly packages with as-yet-incomplete tests in their 
> inst directory).
> 
>   Issue: testInstalledPackage("mets")  generates an  "Error in 
> checkVignettes(pkg, 
> lib.loc, latex = FALSE, weave = TRUE)" message, which stops my script.  The 
> source code 
> for that package passes R CMD check.
> I can easily work around this in the short term by adding mets to my 
> do-not-check list.
> 
> Footnote: the newer "check_packages_in_dir" routine doesn't work for me.   
> The biggest 
> reason is that it doesn't have a way to give a list of packages to skip.  My 
> little 
> desktop box doesn't have every linux library (cryptography, geospatial, 
> etc.), nor do I 
> load up bioconductor; which leads to a boatload of false positives.  I keep 
> adding things 
> but the packages add faster.
> 
> Terry T.
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug report: Function ppois(0:20, lambda=0.9) does not generate a non-decreasing result.

2018-12-04 Thread Ben Bolker


  I do think it's plausible to expect that we could get *non-decreasing*
results.

I get

any(diff(exp(ppois(0:19, lambda=0.9, log.p=TRUE)))<0)

as FALSE.

But I do get diff(ppois(18:19, lambda=0.9)) < 0.

Looking at the code of ppois, it's just (within C code) calling pgamma
with pgamma(lambda, shape=1+x, scale=1, lower.tail=FALSE):


identical(
  ppois(18:19,0.9),
  pgamma(0.9,shape=1+(18:19),scale=1,lower.tail=FALSE)
)

is TRUE.  So the problem (if we choose to define it as such) would be in
pgamma (upper tail should be a non-decreasing function of the shape
parameter) ... the code is here
https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/nmath/pgamma.c
 if anyone wants to dig in ...




On 2018-12-04 5:46 a.m., Serguei Sokol wrote:
> Le 04/12/2018 à 11:27, Iñaki Ucar a écrit :
>> On Tue, 4 Dec 2018 at 11:12,  wrote:
>>> function ppois is a function calculate the CDF of Poisson
>>> distribution, it should generate a non-decreasing result, but what I
>>> got is:
>>>
 any(diff(ppois(0:19,lambda=0.9))<0)
>>> [1] TRUE
>>>
>>> Actually,
>>>
 ppois(19,lambda=0.9)>> [1] TRUE
>>>
>>> Which could not be TRUE.
>> This is just another manifestation of
>>
>> 0.1 * 3 > 0.3
>> #> [1] TRUE
>>
>> This discussion returns to this list from time to time. TLDR; this is
>> not an R issue, but an unavoidable floating point issue.
> Well, here the request may be interpreted not as "do it without round
> error" which is indeed unavoidable but rather "please cope with rounding
> errors in a way that return consistent result for ppois()". You have
> indicated one way to do so (I have just added exp() in the row):
> 
> any(diff(exp(ppois(0:19, lambda=0.9, log.p=TRUE))) < 0)
> #[1] FALSE
> 
> But may be there is another, more economic way?
> 
> Serguei.
> 
>>   Solution:
>> work with log-probabilities instead.
>>
>> any(diff(ppois(0:40, lambda=0.9, log.p=TRUE))<0)
>> #> [1] FALSE
>>
>> Iñaki
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trivial typo in src/library/base/man/LongVectors.Rd

2018-12-12 Thread Ben Bolker
  Line 23:

"In theory up they can to"

  should be

 "In theory they can be up to"

or (slightly more formally)

  "In theory they can contain up to"

  cheers
Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Documentation examples for lm and glm

2018-12-13 Thread Ben Bolker


  Agree.  Or just create the data frame with those variables in it
directly ...

On 2018-12-13 3:26 p.m., Thomas Yee wrote:
> Hello,
> 
> something that has been on my mind for a decade or two has
> been the examples for lm() and glm(). They encourage poor style
> because of mismanagement of data frames. Also, having the
> variables in a data frame means that predict()
> is more likely to work properly.
> 
> For lm(), the variables should be put into a data frame.
> As 2 vectors are assigned first in the general workspace they
> should be deleted afterwards.
> 
> For the glm(), the data frame d.AD is constructed but not used. Also,
> its 3 components were assigned first in the general workspace, so they
> float around dangerously afterwards like in the lm() example.
> 
> Rather than attached improved .Rd files here, they are put at
> www.stat.auckland.ac.nz/~yee/Rdfiles
> You are welcome to use them!
> 
> Best,
> 
> Thomas
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unnecessary apostrophe in English base::summary() NA count output?

2018-12-17 Thread Ben Bolker
There seem to be a variety of opinions about style in this case; do
you omit the apostrophe ("NAs") because it's not a possessive or a
contraction, or do you include the apostrophe ("NA's") to clearly
distinguish the acronym from the plural form?

 I personally prefer "NAs" to "NA's" but both are defensible.

https://english.stackexchange.com/questions/55970/plurals-of-acronyms-letters-numbers-use-an-apostrophe-or-not
https://brians.wsu.edu/2016/05/16/acronyms-and-apostrophes/ ("many
people object to it")

On Mon, Dec 17, 2018 at 10:20 AM Hernando Cortina  wrote:
>
>  Hello, this is quite a minor issue but as summary() is in all likelihood
> one of the most widely used functions in R I decided to email this list.
> When producing a count of missing values, summary() in English generates an
> unnecessary and grammatically incorrect apostrophe (NA's rather than NAs)
> in its table header.  For example:
>
> > summary(c(1,2,NA,3,4,NA))
>Min. 1st Qu.  MedianMean 3rd Qu.Max.NA's
>1.001.752.502.503.254.00   2
>
> The issue can be traced to this file:
> https://svn.r-project.org/R/trunk/src/library/base/R/summary.R
> Unless this is being done intentionally for some reason, the solution would
> seem to be to replace the string "NA's" with "NAs".  There are 9
> occurrences in the file.
>
> Thank you very much.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] history of objects() and ls()

2019-01-03 Thread Ben Bolker


  I found out today (maybe I had known sometime before??) that objects()
is a synonym for ls().  I'm curious about the history, which seems to go
at least back to the beginning of R.  It's been thus since SVN revision
2 (Sep 1997) ...

svn cat https://svn.r-project.org/R/trunk/src/library/base/R/attach@2 |
grep objects

  I had a quick look at the Becker & Chambers brown book (1984) and
Becker and Wilks blue book (1988) on Google books and could find ls but
not objects() ... ?

  Anyone happen to know?

 cheers
   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] long-standing documentation bug in ?anova.lme

2019-01-17 Thread Ben Bolker
tl;dr anova.lme() claims to provide sums of squares, but it doesn't. And
some names are misspelled in ?lme.  I can submit all this stuff as a bug
report if that's preferred.

?anova.lme says:

When only one fitted model object is present, a data frame with
 the sums of squares, numerator degrees of freedom, denominator
 degrees of freedom, F-values, and P-values

The output of

fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age
anova(fm1)

gives columns

numDF denDF   F-value p-value

-- i.e. the sums of squares aren't there!  (For fairly good reasons; lme
doesn't actually compute them internally, and it might not always be
straightforward to compute them, for more complex models. They would
mostly be useful for comparison with simpler, method-of-moments based
approaches like aov()). Federico Calboli pointed this out on r-help in
2004: https://stat.ethz.ch/pipermail/r-help/2004-May/051444.html


Two more points:

  * the last sentence of the Description might need one fewer comma
[after "statistic"] or one more [after "p-value"].
  * in ?lme, Littell's name is misspelled at least twice and Reinsel's
at least once.

  Is there a publicly accessible SVN server for recommended packages (in
general) and nlme (in particular) anywhere?

  cheers
  Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] long-standing documentation bug in ?anova.lme

2019-01-20 Thread Ben Bolker


  Silence on this so far? Trying here one more time, otherwise I'll
submit it as a bug report ...

  cheers
   Ben Bolker

On 2019-01-17 12:32 p.m., Ben Bolker wrote:
> tl;dr anova.lme() claims to provide sums of squares, but it doesn't. And
> some names are misspelled in ?lme.  I can submit all this stuff as a bug
> report if that's preferred.
> 
> ?anova.lme says:
> 
> When only one fitted model object is present, a data frame with
>  the sums of squares, numerator degrees of freedom, denominator
>  degrees of freedom, F-values, and P-values
> 
> The output of
> 
> fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age
> anova(fm1)
> 
> gives columns
> 
> numDF denDF   F-value p-value
> 
> -- i.e. the sums of squares aren't there!  (For fairly good reasons; lme
> doesn't actually compute them internally, and it might not always be
> straightforward to compute them, for more complex models. They would
> mostly be useful for comparison with simpler, method-of-moments based
> approaches like aov()). Federico Calboli pointed this out on r-help in
> 2004: https://stat.ethz.ch/pipermail/r-help/2004-May/051444.html
> 
> 
> Two more points:
> 
>   * the last sentence of the Description might need one fewer comma
> [after "statistic"] or one more [after "p-value"].
>   * in ?lme, Littell's name is misspelled at least twice and Reinsel's
> at least once.
> 
>   Is there a publicly accessible SVN server for recommended packages (in
> general) and nlme (in particular) anywhere?
> 
>   cheers
>   Ben Bolker
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] long-standing documentation bug in ?anova.lme

2019-01-21 Thread Ben Bolker

  Here are relevant patches to address the various issues described
below.  Thanks for the SVN info!

  cheers
Ben Bolker


On 2019-01-21 4:54 a.m., Martin Maechler wrote:
>>>>>> Ben Bolker 
>>>>>> on Thu, 17 Jan 2019 12:32:20 -0500 writes:
> 
> > tl;dr anova.lme() claims to provide sums of squares, but it doesn't. And
> > some names are misspelled in ?lme.  I can submit all this stuff as a bug
> > report if that's preferred.
> 
> > ?anova.lme says:
> 
> > When only one fitted model object is present, a data frame with
> > the sums of squares, numerator degrees of freedom, denominator
> > degrees of freedom, F-values, and P-values
> 
> > The output of
> 
> > fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age
> > anova(fm1)
> 
> > gives columns
> 
> > numDF denDF   F-value p-value
> 
> > -- i.e. the sums of squares aren't there!  (For fairly good reasons; lme
> > doesn't actually compute them internally, and it might not always be
> > straightforward to compute them, for more complex models. They would
> > mostly be useful for comparison with simpler, method-of-moments based
> > approaches like aov()). Federico Calboli pointed this out on r-help in
> > 2004: https://stat.ethz.ch/pipermail/r-help/2004-May/051444.html
> 
> 
> > Two more points:
> 
> > * the last sentence of the Description might need one fewer comma
> > [after "statistic"] or one more [after "p-value"].
> > * in ?lme, Littell's name is misspelled at least twice and Reinsel's
> > at least once.
> 
> We'd be grateful for patches, thank you Ben!
> 
> Notably for 'nlme' and 'foreign', both of which are maintained
> by R-core (rather than individual R core or R Foundation
> members) we've also encouraged that  R's bugzilla be used for
> non-trivial bug reports as that allows attached patches and
> simple references too. 
> 
> 
> > Is there a publicly accessible SVN server for recommended packages (in
> > general) and nlme (in particular) anywhere?
> 
> nlme's SVN is physically at the same place as the R sources
> (here at ETH Zurich), with URL
> 
>https://svn.r-project.org/R-packages/trunk/nlme
> 
> in addition to 'nlme', at least  'foreign', 'mgcv'  and
> 'cluster' are also maintained there.
> 
> Thank you for the question:
>  I do think "we" should add the corresponding  svn URL to the
>  respective DESCRIPTION file.
> 
> OTOH, 'Matrix' has moved to R-forge a while ago .. and I'm
> currently also not sure about the other Recommended packages
> such as 'KernSmooth' or 'boot' . 
> 
> Best,
> Martin
> 
> Martin Maechler
> ETH Zurich and R core team
> 
Index: nlme/DESCRIPTION
===
--- nlme/DESCRIPTION(revision 7616)
+++ nlme/DESCRIPTION(working copy)
@@ -21,3 +21,4 @@
 Encoding: UTF-8
 License: GPL (>= 2) | file LICENCE
 BugReports: https://bugs.r-project.org
+URL: https://svn.r-project.org/R-packages/trunk/nlme
\ No newline at end of file
Index: nlme/man/anova.lme.Rd
===
--- nlme/man/anova.lme.Rd   (revision 7616)
+++ nlme/man/anova.lme.Rd   (working copy)
@@ -61,7 +61,7 @@
 }
 \description{
   When only one fitted model object is present, a data frame with the
-  sums of squares, numerator degrees of freedom, denominator degrees of
+  numerator degrees of freedom, denominator degrees of
   freedom, F-values, and P-values for Wald tests for the terms in the
   model (when \code{Terms} and \code{L} are \code{NULL}), a combination
   of model terms (when \code{Terms} in not \code{NULL}), or linear
@@ -71,7 +71,7 @@
   log-likelihood, the Akaike Information Criterion (AIC), and the
   Bayesian Information Criterion (BIC) of each object is returned.  If
   \code{test=TRUE}, whenever two consecutive  objects have different
-  number of degrees of freedom, a likelihood ratio statistic, with the
+  number of degrees of freedom, a likelihood ratio statistic with the
   associated p-value is included in the returned data frame.
 }
 \value{
Index: nlme/man/lme.Rd
===
--- nlme/man/lme.Rd (revision 7616)
+++ nlme/man/lme.Rd (working copy)
@@ -117,8 +117,8 @@
   (1982).  The variance-covariance parametrizations are described in
   Pinheiro and Bates (1996).  The different correlation structures
   available for the \code{correlatio

Re: [Rd] Runnable R packages

2019-02-02 Thread Ben Bolker


  remotes has fewer dependencies.  I believe that the current version of
devtools just re-exports install_github etc. from the remotes package.

On 2019-02-02 11:31 a.m., David Lindelof wrote:
> I see some value in Duncan’s proposal to implement this as an extra package
> instead of a change to base R, if only to see if the idea has legs. I’m
> minded to do so myself using your suggestion, but is there a particular
> reason why you recommend using the remotes package instead of devtools? The
> latter seems to have the same functions I would need, and I believe it is
> more widely installed that remotes?
> 
> Kind regards,
> 
> From: Duncan Murdoch  
> Reply: Duncan Murdoch  
> Date: 2 February 2019 at 15:37:16
> To: Barry Rowlingson 
> , Abs Spurdle 
> 
> Cc: r-devel  
> Subject:  Re: [Rd] Runnable R packages
> 
> On 02/02/2019 8:27 a.m., Barry Rowlingson wrote:
>> I don't think anyone denies that you *could* make an EXE to do all
>> that. The discussion is on *how easy* it should be to create a single
>> file that contains an initial "main" function plus a set of bundled
>> code (potentially as a package) and which when run will install its
>> package code (which is contained in itself, its not in a repo),
>> install dependencies, and run the main() function.
>>
>> Now, I could build a self-executable shar file that bundled a package
>> together with a script to do all the above. But if there was a "RUN"
>> command in R, and a convention that a function called "foo::main"
>> would be run by `R CMD RUN foo_1.1.1.tar.gz` then it would be so much
>> easier to develop and test.
> 
> I don't believe the "so much easier" argument that this requires a
> change to base R. If you put that functionality into a package, then
> the only extra effort the user would require is to install that other
> package. After that, they could run
> 
> Rscript -e "yourpackage::run_main('foo_1.1.1.tar.gz')"
> 
> as I suggested before. This is no harder than running
> 
> R CMD RUN foo_1.1.1.tar.gz
> 
> The advantage of this from R Core's perspective is that you would be
> developing and maintaining "yourpackage", you wouldn't be passing the
> burden on to them. The advantage from your perspective is that you
> could work with whatever packages you liked. The "remotes" package has
> almost everything you need so that "yourpackage" could be nearly
> trivial. You wouldn't need to duplicate it within base R.
> 
> Duncan Murdoch
> 
>>
>> If people think this adds value, then if they want to offer that value
>> to me as $ or £, I'd consider writing it if their total value was more
>> than my cost
>>
>> Barry
>>
>>
>> On Sat, Feb 2, 2019 at 12:54 AM Abs Spurdle  wrote:
>>>
>>> Further to my previous post,
>>> it would be possible to create an .exe file, say:
>>>
>>> my_r_application.exe
>>>
>>> That starts R, loads your R package(s), calls the R function of your
> choice
>>> and does whatever else you want.
>>>
>>> However, I don't think that it would add much value.
>>> But feel free to correct me if you think that I'm wrong.
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> David Lindelöf, Ph.D.
> +41 (0)79 415 66 41  or skype:david.lindelof
> http://computersandbuildings.com
> Follow me on Twitter:
> http://twitter.com/dlindelof
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] code for sum function

2019-02-19 Thread Ben Bolker


This SO question may be of interest:

https://stackoverflow.com/questions/38589705/difference-between-rs-sum-and-armadillos-accu/

  which points out that sum() isn't doing anything fancy *except* using
extended-precision registers when available.  (Using Kahan's algorithm
does come at a computational cost ...)

On 2019-02-19 2:08 p.m., William Dunlap via R-devel wrote:
> The algorithm does make a differece.  You can use Kahan's summation
> algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to
> reduce the error compared to the naive summation algorithm.  E.g., in R
> code:
> 
> naiveSum <-
> function(x) {
>s <- 0.0
>for(xi in x) s <- s + xi
>s
> }
> kahanSum <- function (x)
> {
>s <- 0.0
>c <- 0.0 # running compensation for lost low-order bits
>for(xi in x) {
>   y <- xi - c
>   t <- s + y # low-order bits of y may be lost here
>   c <- (t - s) - y
>   s <- t
>}
>s
> }
> 
>> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0)
>> rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), 0)
>> rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), 0)
>>
>> table(rSum == rNaiveSum)
> 
> FALSE  TRUE
>21 5
>> table(rSum == rKahanSum)
> 
> FALSE  TRUE
> 323
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> 
> On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert  wrote:
> 
>> (I didn't see anyone else answer this, so ...)
>>
>> You can probably find the R code in src/main/ but I'm not sure. You are
>> talking about a very simple calculation, so it seems unlike that the
>> algorithm is the cause of the difference. I have done much more
>> complicated things and usually get machine precision comparisons. There
>> are four possibilities I can think of that could cause (small) differences.
>>
>> 0/ Your code is wrong, but that seems unlikely on such a simple
>> calculations.
>>
>> 1/ You are summing a very large number of numbers, in which case the sum
>> can become very large compared to numbers being added, then things can
>> get a bit funny.
>>
>> 2/ You are using single precision in fortran rather than double. Double
>> is needed for all floating point numbers you use!
>>
>> 3/ You have not zeroed the double precision numbers in fortran. (Some
>> compilers do not do this automatically and you have to specify it.) Then
>> if you accidentally put singles, like a constant 0.0 rather than a
>> constant 0.0D+0, into a double you will have small junk in the lower
>> precision part.
>>
>> (I am assuming you are talking about a sum of reals, not integer or
>> complex.)
>>
>> HTH,
>> Paul Gilbert
>>
>> On 2/14/19 2:08 PM, Rampal Etienne wrote:
>>> Hello,
>>>
>>> I am trying to write FORTRAN code to do the same as some R code I have.
>>> I get (small) differences when using the sum function in R. I know there
>>> are numerical routines to improve precision, but I have not been able to
>>> figure out what algorithm R is using. Does anyone know this? Or where
>>> can I find the code for the sum function?
>>>
>>> Regards,
>>>
>>> Rampal Etienne
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mle (stat4) crashing due to singular Hessian in covariance matrix calculation

2019-02-19 Thread Ben Bolker


  I don't know if this will get much response from the R developers;
they might just recommend that you protect your mle() call in a try() or
tryCatch() to stop it from breaking your loop.  Alternatively, you could
try mle2() function in the bbmle package, which started out long ago as
a slightly more flexible and robust version of stats4::mle(); I don't
remember/can't promise that it handles fits with singular Hessians, but
I'm guessing it does ...

  cheers
   Ben Bolker


On 2019-02-19 12:02 p.m., Francisco Matorras wrote:
> Hi, R developers.
> when running mle inside a loop I found a nasty behavior. From time to
> time, my model had a degenerate minimum and the loop just crashed. I
> tracked it down to "vcov <- if (length(coef)) solve(oout$hessian)" line,
> being the hessian singular.
> Note that the minimum reached was good, it just did not make sense to
> calculate the covariance matrix as the inverse of a singular Hessian. In
> my case i am just interested on the value of the log-likelihood. For my
> application, I patched it easily in a local version of mle just removing
> this call since I am not using vcov at all, but i wonder if it can be
> improved in the official release. I can imagine of two simple solutions,
> either including vcov calculation as an option or avoiding the call to
> solve if the hessian is singular (setting vcov to NA). I am willing to
> write a few lines of coded if you think it is worth.
> 
> regards
> 
> Francisco Matorras
> Instituto de Física de Cantabria
> Universidad de Cantabria
> 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] model.matrix.default() silently ignores bad contrasts.arg

2019-02-20 Thread Ben Bolker
An lme4 user pointed out <https://github.com/lme4/lme4/issues/491> that
passing contrasts as a string or symbol to [g]lmer (which would work if
we were using `contrasts<-` to set contrasts on a factor variable) is
*silently ignored*. This goes back to model.matrix(), and seems bad
(this is a very easy mistake to make, because of the multitude of ways
to specify contrasts for factors in R  - e.g. options(contrasts=...);
setting contrasts on the specific factors; passing contrasts as a list
to the model function ... )

The relevant code is here:

https://github.com/wch/r-source/blob/trunk/src/library/stats/R/models.R#L578-L603

The following code shows the problem: a plain-vanilla model.matrix()
call with no contrasts argument, followed by two wrong contrasts
arguments, followed by a correct contrasts argument.

data(cbpp, package="lme4")
mf1 <- model.matrix(~period, data=cbpp)
mf2 <- model.matrix(~period, contrasts.arg="contr.sum", data=cbpp)
all.equal(mf1,mf2) ## TRUE
mf3 <- model.matrix(~period, contrasts.arg=contr.sum, data=cbpp)
all.equal(mf1,mf3)  ## TRUE
mf4 <- model.matrix(~period, contrasts.arg=list(period=contr.sum),
data=cbpp)
isTRUE(all.equal(mf1,mf4))  ## FALSE


  I've attached a potential patch for this, which is IMO the mildest
possible case (if contrasts.arg is non-NULL and not a list, it produces
a warning).  I haven't been able to test it because of some mysterious
issues I'm having with re-making R properly ...

  Thoughts?  Should I submit this as a bug report/patch?

  cheers
   Ben Bolker


Index: src/library/stats/R/models.R
===
--- src/library/stats/R/models.R	(revision 76140)
+++ src/library/stats/R/models.R	(working copy)
@@ -589,20 +589,23 @@
 contrasts(data[[nn]]) <- contr.funs[1 + isOF[nn]]
 ## it might be safer to have numerical contrasts:
 ##	  get(contr.funs[1 + isOF[nn]])(nlevels(data[[nn]]))
-if (!is.null(contrasts.arg) && is.list(contrasts.arg)) {
-if (is.null(namC <- names(contrasts.arg)))
-stop("invalid 'contrasts.arg' argument")
-for (nn in namC) {
-if (is.na(ni <- match(nn, namD)))
-warning(gettextf("variable '%s' is absent, its contrast will be ignored", nn),
-domain = NA)
-else {
-ca <- contrasts.arg[[nn]]
-if(is.matrix(ca)) contrasts(data[[ni]], ncol(ca)) <- ca
-else contrasts(data[[ni]]) <- contrasts.arg[[nn]]
-}
-}
-}
+if (!is.null(contrasts.arg)) {
+if (!is.list(contrasts.arg)) {
+warning("non-list contrasts argument ignored")
+} else {  ## contrasts.arg is a list
+if (is.null(namC <- names(contrasts.arg)))
+stop("'contrasts.arg' argument must be named")
+for (nn in namC) {
+if (is.na(ni <- match(nn, namD)))
+warning(gettextf("variable '%s' is absent, its contrast will be ignored", nn),
+domain = NA)
+else {
+ca <- contrasts.arg[[nn]]
+if(is.matrix(ca)) contrasts(data[[ni]], ncol(ca)) <- ca
+else contrasts(data[[ni]]) <- contrasts.arg[[nn]]
+}}
+} ## contrasts.arg is a list
+} ## non-null contrasts.arg
 } else { #  no rhs terms ('~1', or '~0'): internal model.matrix needs some variable
 	isF <- FALSE
 	data[["x"]] <- raw(nrow(data))
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] model.matrix.default() silently ignores bad contrasts.arg

2019-02-21 Thread Ben Bolker
On Thu, Feb 21, 2019 at 7:49 AM Fox, John  wrote:
>
> Dear Ben,
>
> Perhaps I'm missing the point, but contrasts.arg is documented to be a list. 
> From ?model.matrix: "contrasts.arg: A list, whose entries are values (numeric 
> matrices or character strings naming functions) to be used as replacement 
> values for the contrasts replacement function and whose names are the names 
> of columns of data containing factors."

  I absolutely agree that this is not a bug/behaves as documented (I
could have said that more clearly).  It's just that (for reasons I
attempted to explain) this is a really easy mistake to make.

> This isn't entirely accurate because a function also works as a named element 
> of the list (in addition to a character string naming a function and a 
> contrast matrix), as your example demonstrates, but nowhere that I'm aware of 
> is it suggested that a non-list should work.
>
> It certainly would be an improvement if specifying contrast.arg as a non-list 
> generated an error or warning message, and it at least arguably would be 
> convenient to allow a general contrast specification such as 
> contrasts.arg-"contr.sum", but I don't see a bug here.

  I agree.  That's what my patch does (throws a warning message if
contrasts.arg is non-NULL and not a list).

  cheers
   Ben Bolker



>
> Best,
>  John
>
>   -
>   John Fox, Professor Emeritus
>   McMaster University
>   Hamilton, Ontario, Canada
>   Web: http::/socserv.mcmaster.ca/jfox
>
> > On Feb 20, 2019, at 7:14 PM, Ben Bolker  wrote:
> >
> > An lme4 user pointed out <https://github.com/lme4/lme4/issues/491> that
> > passing contrasts as a string or symbol to [g]lmer (which would work if
> > we were using `contrasts<-` to set contrasts on a factor variable) is
> > *silently ignored*. This goes back to model.matrix(), and seems bad
> > (this is a very easy mistake to make, because of the multitude of ways
> > to specify contrasts for factors in R  - e.g. options(contrasts=...);
> > setting contrasts on the specific factors; passing contrasts as a list
> > to the model function ... )
> >
> > The relevant code is here:
> >
> > https://github.com/wch/r-source/blob/trunk/src/library/stats/R/models.R#L578-L603
> >
> > The following code shows the problem: a plain-vanilla model.matrix()
> > call with no contrasts argument, followed by two wrong contrasts
> > arguments, followed by a correct contrasts argument.
> >
> > data(cbpp, package="lme4")
> > mf1 <- model.matrix(~period, data=cbpp)
> > mf2 <- model.matrix(~period, contrasts.arg="contr.sum", data=cbpp)
> > all.equal(mf1,mf2) ## TRUE
> > mf3 <- model.matrix(~period, contrasts.arg=contr.sum, data=cbpp)
> > all.equal(mf1,mf3)  ## TRUE
> > mf4 <- model.matrix(~period, contrasts.arg=list(period=contr.sum),
> > data=cbpp)
> > isTRUE(all.equal(mf1,mf4))  ## FALSE
> >
> >
> >  I've attached a potential patch for this, which is IMO the mildest
> > possible case (if contrasts.arg is non-NULL and not a list, it produces
> > a warning).  I haven't been able to test it because of some mysterious
> > issues I'm having with re-making R properly ...
> >
> >  Thoughts?  Should I submit this as a bug report/patch?
> >
> >  cheers
> >   Ben Bolker
> >
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] code for sum function

2019-02-21 Thread Ben Bolker
Specifically: https://svn.r-project.org/R/trunk/src/main/summary.c

And if you don't want to deal with Subversion, you can look at the
read-only github mirror:

https://github.com/wch/r-source/blob/e5b21d0397c607883ff25cca379687b86933d730/src/main/summary.c#L115-L131

On Thu, Feb 21, 2019 at 11:57 AM David Winsemius  wrote:
>
>
> On 2/20/19 2:55 PM, Rampal Etienne wrote:
> > Dear Tomas,
> >
> > Where do I find these files? Do they contain the code for the sum function?
>
> Yes.
>
> https://svn.r-project.org/R/trunk/
>
>
> David
>
> >
> > What do you mean exactly with your point on long doubles? Where can I find
> > documentation on this?
> >
> > Cheers, Rampal
> >
> > On Mon, Feb 18, 2019, 15:38 Tomas Kalibera  >
> >> See do_summary() in summary.c, rsum() for doubles. R uses long double
> >> type as accumulator on systems where available.
> >>
> >> Best,
> >> Tomas
> >>
> >> On 2/14/19 2:08 PM, Rampal Etienne wrote:
> >>> Hello,
> >>>
> >>> I am trying to write FORTRAN code to do the same as some R code I
> >>> have. I get (small) differences when using the sum function in R. I
> >>> know there are numerical routines to improve precision, but I have not
> >>> been able to figure out what algorithm R is using. Does anyone know
> >>> this? Or where can I find the code for the sum function?
> >>>
> >>> Regards,
> >>>
> >>> Rampal Etienne
> >>>
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >>
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] model.matrix.default() silently ignores bad contrasts.arg

2019-02-23 Thread Ben Bolker


 thanks!

On 2019-02-23 5:42 a.m., Martin Maechler wrote:
>>>>>> Fox, John 
>>>>>> on Fri, 22 Feb 2019 17:40:15 + writes:
> 
> > Dear Martin and Ben, I agree that a warning is a good idea
> > (and perhaps that wasn't clear in my response to Ben's
> > post).
> 
> > Also, it would be nice to correct the omission in the help
> > file, which as far as I could see doesn't mention that a
> > contrast-generating function (as opposed to its quoted
> > name) can be an element of the contrasts.arg list.
> 
> > Best, John
> 
> Thank you John for the clarification and the reminder about
> filling the omission there!
> 
> Prepared to go (into the sources) now.
> Martin
> 
> 
> >> -Original Message- From: Martin Maechler
> >> [mailto:maech...@stat.math.ethz.ch] Sent: Friday,
> >> February 22, 2019 11:50 AM To: Ben Bolker
> >>  Cc: Fox, John ;
> >> r-devel@r-project.org Subject: Re: [Rd]
> >> model.matrix.default() silently ignores bad contrasts.arg
> >> 
> >> >>>>> Ben Bolker >>>>> on Thu, 21 Feb 2019 08:18:51 -0500
> >> writes:
> >> 
> >> > On Thu, Feb 21, 2019 at 7:49 AM Fox, John
> >>  wrote:
> >> >>
> >> >> Dear Ben,
> >> >>
> >> >> Perhaps I'm missing the point, but contrasts.arg is
> >> documented to be a list. From ?model.matrix:
> >> "contrasts.arg: A list, whose entries are values (numeric
> >> matrices or character strings naming functions) to be
> >> used as replacement values for the contrasts replacement
> >> function and whose names are the names of columns of data
> >> containing factors."
> >> 
> >> > I absolutely agree that this is not a bug/behaves as
> >> documented (I > could have said that more clearly).  It's
> >> just that (for reasons I > attempted to explain) this is
> >> a really easy mistake to make.
> >> 
> >> >> This isn't entirely accurate because a function also
> >> works as a named element of the list (in addition to a
> >> character string naming a function and a contrast
> >> matrix), as your example demonstrates, but nowhere that
> >> I'm aware of is it suggested that a non-list should work.
> >> >>
> >> >> It certainly would be an improvement if specifying
> >> contrast.arg as a non- list generated an error or warning
> >> message, and it at least arguably would be convenient to
> >> allow a general contrast specification such as
> >> contrasts.arg- "contr.sum", but I don't see a bug here.
> >> 
> >> > I agree.  That's what my patch does (throws a warning
> >> message if > contrasts.arg is non-NULL and not a list).
> >> 
> >> I currently do think this is a good idea... "even though"
> >> I'm 99% sure that this will make work for package
> >> maintainers and others whose code may suddenly show
> >> warnings.  I hope they would know better than
> >> suppressWarnings(.) ...
> >> 
> >> I see a version of the patch using old style indentation
> >> which makes the diff even "considerably" smaller -- no
> >> need to submit this different, though -- and I plan to
> >> test that a bit, and commit eventually to R-devel,
> >> possibly in a 5 days or so.
> >> 
> >> Thank you Ben for the suggestion and patch !  Martin
> >> 
> >> > cheers > Ben Bolker
> >> 
> >> >> Best, >> John
> >> >>
> >> >> -
> >> >> John Fox, Professor Emeritus >> McMaster University >>
> >> Hamilton, Ontario, Canada >> Web:
> >> http::/socserv.mcmaster.ca/jfox
> >> >>
> >> >> > On Feb 20, 2019, at 7:14 PM, Ben Bolker
> >>  wrote:
> >> >> >
> >> >> > An lme4 user pointed out
> >> <https://github.com/lme4/lme4/issues/491> that >> >
> >> passing contrasts as a string or symbol to [g]lmer (which
> >> would

[Rd] r76237 broken?

2019-03-14 Thread Ben Bolker
It looks like the most recent SVN commit changed line 1068 of
src/library/tools/R/admin.R to include a call to "shQuotee" (sic), which
is now breaking Travis r-devel builds ... ('checking sizes of PDF files
under ‘inst/doc’: .Error in shQuotee(tf) : could not find function
"shQuotee"')

The new line reads:

res <- system2(qpdf, c(qpdf_flags, shQuote(p), shQuotee(tf)),
FALSE, FALSE)

see:

svn diff -r76237:76236 src/library/tools/R/admin.R

 Seems like a straight-up typo?

  cheers
Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] r76237 broken?

2019-03-14 Thread Ben Bolker
PS there's also a "shQoute(tf2)" on line 1063 that will presumably also
cause trouble at some point ...

On 2019-03-14 3:57 p.m., Ben Bolker wrote:
> It looks like the most recent SVN commit changed line 1068 of
> src/library/tools/R/admin.R to include a call to "shQuotee" (sic), which
> is now breaking Travis r-devel builds ... ('checking sizes of PDF files
> under ‘inst/doc’: .Error in shQuotee(tf) : could not find function
> "shQuotee"')
> 
> The new line reads:
> 
> res <- system2(qpdf, c(qpdf_flags, shQuote(p), shQuotee(tf)),
> FALSE, FALSE)
> 
> see:
> 
> svn diff -r76237:76236 src/library/tools/R/admin.R
> 
>  Seems like a straight-up typo?
> 
>   cheers
> Ben Bolker
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH 1/2] readtable: add hook for type conversions per column

2019-03-26 Thread Ben Bolker


  You need admin assistance, someone will probably see your request here
and fulfill it.

  It might be helpful to read this question/answer on StackOverflow
discussing the context of proposing patches to base R functionality ...

https://stackoverflow.com/questions/8065835/proposing-feature-requests-to-the-r-core-team

  cheers
Ben Bolker


On 2019-03-26 4:20 p.m., Kurt Van Dijck wrote:
> On di, 26 mrt 2019 12:48:12 -0700, Michael Lawrence wrote:
>>Please file a bug on bugzilla so we can discuss this further.
> 
> All fine.
> I didn't find a way to create an account on bugs.r-project.org.
> Did I just not see it? or do I need administrator assistance?
> 
> Kind regards,
> Kurt
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [RFC] readtable enhancement

2019-03-27 Thread Ben Bolker
   Just to clarify/amplify: on the bug tracking system there's a
drop-down menu to specify severity, and "enhancement" is one of the
choices, so you don't have to worry that you're misrepresenting your
patch as fixing a bug.

  The fact that an R-core member (Michael Lawrence) thinks this is
worth looking at is very encouraging (and somewhat unusual for
feature/enhancement suggestions)!

  Ben Bolker

On Wed, Mar 27, 2019 at 5:29 PM Michael Lawrence via R-devel
 wrote:
>
> This has some nice properties:
>
> 1) It self-documents the input expectations in a similar manner to
> colClasses.
> 2) The implementation could eventually "push down" the coercion, e.g.,
> calling it on each chunk of an iterative read operation.
>
> The implementation needs work though, and I'm not convinced that coercion
> failures should fallback gracefully to the default.
>
> Feature requests fall under a "bug" in bugzilla terminology, so please
> submit this there. I think I've made you an account.
>
> Thanks,
> Michael
>
> On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck <
> dev.k...@vandijck-laurijssen.be> wrote:
>
> > Thank you for your answers.
> > I rather do not file a new bug, since what I coded isn't really a bug.
> >
> > The problem I (my colleagues) have today is very stupid:
> > We read .csv files with a lot of columns, of which most contain
> > date-time stamps, coded in DD/MM/ HH:MM.
> > This is not exotic, but the base library's readtable (and derivatives)
> > only accept date-times in a limited number of possible formats (which I
> > understand very well).
> >
> > We could specify a format in a rather complicated format, for each
> > column individually, but this syntax is rather difficult to maintain.
> >
> > My solution to this specific problem became trivial, yet generic
> > extension to read.table.
> > Rather than relying on the built-in type detection, I added a parameter
> > to a function that will be called for each to-be-type-probed column so I
> > can overrule the built-in limited default.
> > If nothing returns from the function, the built-in default is still
> > used.
> >
> > This way, I could construct a type-probing function that is
> > straight-forward, not hard to code, and makes reading my .csv files
> > acceptible in terms of code (read.table parameters).
> >
> > I'm sure I'm not the only one dealing with such needs, escpecially
> > date-time formats exist in enormous amounts, but I want to stress here
> > that my approach is agnostic to my specific problem.
> >
> > For those asking to 'show me the code', I redirect to my 2nd patch,
> > where the tests have been extended with my specific problem.
> >
> > What are your opinions about this?
> >
> > Kind regards,
> > Kurt
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in the "reformulate" function in stats package

2019-03-29 Thread Ben Bolker
  I suspect that the issue is addressed (obliquely) in the examples,
which shows that variables with spaces in them (or otherwise
'non-syntactic', i.e. not satisfying the constraints of legal R symbols)
can be handled by protecting them with backticks  (``)

 ## using non-syntactic names:
 reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))

It seems to me there could be room for a *documentation* patch (stating
explicitly that if termlabels has length > 1 its elements are
concatenated with "+", and explicitly stating that non-syntactic names
must be protected with back-ticks).  (There is a little bit of obscurity
in the fact that the elements of termlabels don't have to be
syntactically valid names: many will be included in formulas if they can
be interpreted as *parseable* expressions, e.g. reformulate("x<2"))

  I would be happy to give it a shot if the consensus is that it would
be worthwhile.

   One workaround to the OP's problem is below (may be worth including
as an example in docs)

> z <- c("a variable","another variable")
> reformulate(z)
Error in parse(text = termtext, keep.source = FALSE) :
  :1:6: unexpected symbol
1:  ~ a variable
 ^
> reformulate(sprintf("`%s`",z))
~`a variable` + `another variable`




On 2019-03-29 11:54 a.m., J C Nash wrote:
> The main thing is to post the "small reproducible example".
> 
> My (rather long term experience) can be written
> 
>   if (exists("reproducible example") ) {
>  DeveloperFixHappens()
>   } else {
>  NULL
>   }
> 
> JN
> 
> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
>> Well, first I can't sign in bugzilla myself, that is why I wrote here first. 
>> Also, I don't know if I have the time at
>> the moment to provide tests, multiple examples or more. If that is not ok or 
>> welcomed, that is fine, I can come back,
>> whenever I have more time to properly report the bug.
>>
>> I didn't find the existing bug report, sorry for that.
>>
>> Yes, it is related. My problem was that I have column names with spaces and 
>> current solution doesn't solve it. I have a
>> solution, which works for me and maybe also for others.
>>
>> Either, someone can register me to bugzilla or I can post it here, which 
>> could give some direction to developers. I
>> don't mind whichever is preferred here.
>>
>> Best,
>>
>> Saren
>>
>>
>> On 29.03.19 09:29, Martin Maechler wrote:
 Saren Tasciyan
  on Thu, 28 Mar 2019 17:02:10 +0100 writes:
>>>  > Hi,
>>>  > I have found a bug in reformulate function and have a solution for 
>>> it. I
>>>  > was wondering, where I can submit it?
>>>
>>>  > Best,
>>>  > Saren
>>>
>>>
>>> Well, you could have given a small reproducible example
>>> depicting the bug, notably when posting here:
>>> Just a prose text with no R code or other technical content is
>>> almost always not really appropriate fo the R-devel mailing list.
>>>
>>> Further, in such a case you should google a bit and hopefully
>>> have found
>>>     https://www.r-project.org/bugs.html
>>>
>>> which also mention reproducibility (and many more useful things).
>>>
>>> Then it also tells you about R's bug repository, also called
>>> "R's bugzilla" at https://bugs.r-project.org/
>>>
>>> and if you are diligent (but here, I'd say bugzilla is
>>> (configured?) far from ideal), you'd also find bug PR#17359
>>>
>>>     https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
>>>
>>> which was reported already on Nov 2017 .. and only fixed
>>> yesterday (in the "cleanup old bugs" process that happens
>>> often before the big new spring release of R).
>>>
>>> So is your bug the same as that one?
>>>
>>> Martin
>>>
>>>  > --
>>>  > Saren Tasciyan
>>>  > /PhD Student / Sixt Group/
>>>  > Institute of Science and Technology Austria
>>>  > Am Campus 1
>>>  > 3400 Klosterneuburg, Austria
>>>
>>>  > __
>>>  > R-devel@r-project.org mailing list
>>>  > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in the "reformulate" function in stats package

2019-04-04 Thread Ben Bolker

  Proposed patch (I think .txt files work OK as attachments to the list?)

On 2019-04-04 2:21 a.m., Martin Maechler wrote:
>>>>>> Ben Bolker 
>>>>>> on Fri, 29 Mar 2019 12:34:50 -0400 writes:
> 
> > I suspect that the issue is addressed (obliquely) in the examples,
> > which shows that variables with spaces in them (or otherwise
> > 'non-syntactic', i.e. not satisfying the constraints of legal R symbols)
> > can be handled by protecting them with backticks  (``)
> 
> > ## using non-syntactic names:
> > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))
> 
> > It seems to me there could be room for a *documentation* patch (stating
> > explicitly that if termlabels has length > 1 its elements are
> > concatenated with "+", and explicitly stating that non-syntactic names
> > must be protected with back-ticks).  (There is a little bit of obscurity
> > in the fact that the elements of termlabels don't have to be
> > syntactically valid names: many will be included in formulas if they can
> > be interpreted as *parseable* expressions, e.g. reformulate("x<2"))
> 
> > I would be happy to give it a shot if the consensus is that it would
> > be worthwhile.
> 
> I think it would be worthwhile to add to the docs a bit.
> 
> [With currently just your and my vote, we have a 100% consensus
> ;-)]
> 
> Martin
> 
> > One workaround to the OP's problem is below (may be worth including
> > as an example in docs)
> 
> >> z <- c("a variable","another variable")
> >> reformulate(z)
> > Error in parse(text = termtext, keep.source = FALSE) :
> > :1:6: unexpected symbol
> > 1:  ~ a variable
> > ^
> >> reformulate(sprintf("`%s`",z))
> > ~`a variable` + `another variable`
> 
> 
> 
> 
> > On 2019-03-29 11:54 a.m., J C Nash wrote:
> >> The main thing is to post the "small reproducible example".
> >> 
> >> My (rather long term experience) can be written
> >> 
> >> if (exists("reproducible example") ) {
> >> DeveloperFixHappens()
> >> } else {
> >> NULL
> >> }
> >> 
> >> JN
> >> 
> >> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
> >>> Well, first I can't sign in bugzilla myself, that is why I wrote here 
> first. Also, I don't know if I have the time at
> >>> the moment to provide tests, multiple examples or more. If that is 
> not ok or welcomed, that is fine, I can come back,
> >>> whenever I have more time to properly report the bug.
> >>> 
> >>> I didn't find the existing bug report, sorry for that.
> >>> 
> >>> Yes, it is related. My problem was that I have column names with 
> spaces and current solution doesn't solve it. I have a
> >>> solution, which works for me and maybe also for others.
> >>> 
> >>> Either, someone can register me to bugzilla or I can post it here, 
> which could give some direction to developers. I
> >>> don't mind whichever is preferred here.
> >>> 
> >>> Best,
> >>> 
> >>> Saren
> >>> 
> >>> 
> >>> On 29.03.19 09:29, Martin Maechler wrote:
> >>>>>>>>> Saren Tasciyan
> >>>>>>>>>  on Thu, 28 Mar 2019 17:02:10 +0100 writes:
> >>>>  > Hi,
> >>>>  > I have found a bug in reformulate function and have a 
> solution for it. I
> >>>>  > was wondering, where I can submit it?
> >>>> 
> >>>>  > Best,
> >>>>  > Saren
> >>>> 
> >>>> 
> >>>> Well, you could have given a small reproducible example
> >>>> depicting the bug, notably when posting here:
> >>>> Just a prose text with no R code or other technical content is
> >>>> almost always not really appropriate fo the R-devel mailing list.
> >>>> 
> >>>> Further, in such a case you should google a bit and hopefully
> >>>> have found
> &g

Re: [Rd] Bug in the "reformulate" function in stats package

2019-04-18 Thread Ben Bolker


  Your file didn't make it through the mailing list (which is quite
restrictive about which types/extensions it will take).

  I appreciate your enthusiasm and persistence for this issue, but I
suspect you may have trouble convincing R-core to adopt your changes --
they are "better", "easier", "more intuitive" for you ... but how sure
are you they are completely backward compatible, have no performance
issues, will not break in unusual cases ... ?

  Hopefully someone here will set up a bugzilla account so you can post
your patch/it can be further discussed there, if you want to purseu this ...

  cheers
Ben Bolker

On 2019-04-18 7:30 a.m., Saren Tasciyan wrote:
> Hi,
> 
> Sorry for writing this late, I was very busy. I started this discussion
> here. I wish I could write to bugs.r-project.org, but I don't have an
> account and I will write here instead.
> 
> Meanwhile, I solved my problem with a simpler fix (please see attached
> file)/.
> /
> 
> This requires that term labels are not "ticked". I think this is better,
> since it is easier to have column names unticked.
> 
> New development function is IMO unnecessarily complicated. It requires
> strings to be ticked or as.name(). It is more intuitive to have a vector
> of column names.
> 
> Best,
> 
> Saren
> 
> 
> On 05.04.19 09:38, Martin Maechler wrote:
>>>>>>> Ben Bolker
>>>>>>>  on Thu, 4 Apr 2019 12:46:37 -0400 writes:
>>    > Proposed patch
>>
>> Thank you Ben!
>>
>>
>> [the rest is technical nit-picking .. but hopefully interesting
>>   to the smart R-devel reader base:]
>>
>> There was a very subtle thinko in your patch which is not easily
>> diagnosed from R's parse_Rd():
>>
>> Error in
>> parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd",
>>  
>> :
>>    Unexpected end of input (in " quoted string opened at
>> delete.response.Rd:78:63)
>> In addition: Warning message:
>> In
>> parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd",
>>  
>> :
>>    newline within quoted string at delete.response.Rd:74
>>
>> and even I needed more than a minute to find out that the
>> culprit was that
>>
>>    reformulate(sprintf("`%s`", x))
>>
>> is not ok in *.Rd  and must be
>>
>>    reformulate(sprintf("`\%s`", x))
>>
>> -
>>
>>    > (I think .txt files work OK as attachments to the list?)
>>
>> yes, typically -- what really counts is if your e-mail program
>> marks them with MIME-type 'text/plain'
>> and most E-mail programs are very "silly" / "safe" nowadays and
>> don't expect to have smart users  and hence mark (and sometimes
>> encode) everything unknown as non-text.
>>
>> Using very old flexible e-mail interfaces such as Emacs VM allow
>> you to specify the MIME-type in addition to the file *and* it
>> also proposes smart defaults, I think by using something like
>> unix 'file' to determine that your 'foo.diff' file is plain text.
>> {{ .. and we all know that Windows is sillily using file extensions
>>     to determine file type and only knows  Windows-extensions plus
>>     those added explicitly by software installed; so nowadays *.rda
>>     is marked as an Rstudio file ... [argh].
>> }}
>>
>> Martin
>>
>>  > On 2019-04-04 2:21 a.m., Martin Maechler wrote:
>>  >>>>>>> Ben Bolker
>>  >>>>>>> on Fri, 29 Mar 2019 12:34:50 -0400 writes:
>>  >>
>>  >> > I suspect that the issue is addressed (obliquely) in the
>> examples,
>>  >> > which shows that variables with spaces in them (or otherwise
>>  >> > 'non-syntactic', i.e. not satisfying the constraints of
>> legal R symbols)
>>  >> > can be handled by protecting them with backticks  (``)
>>  >>
>>  >> > ## using non-syntactic names:
>>  >> > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))
>>  >>
>>  >> > It seems to me there could be room for a *documentation*
>> patch (stating
>>  >> > explicitly that if termlabels has length > 1 its elements are
>>  >> > concatenated with "+", and explicitly stating that
>> non-syntactic names
>>  >> > must be pr

Re: [Rd] Staged installation fail on some file systems

2019-05-05 Thread Ben Bolker


  This happens to me, too, on an Ubuntu virtual machine (with a "vboxsf"
file system, over an underlying MacOS (HFS) file system), but only when
installing as part of R CMD check ... I did find the workaround, so I
don't think I reported it before.

On 2019-05-04 10:35 p.m., Henrik Bengtsson wrote:
> I'm observing that the new staged installation in R 3.6.0 can produce:
> 
> mv: cannot move
> ‘/home/alice/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-codetools/00new/codetools’
> to ‘/home/alice/R/x86_64-pc-linux-gnu-library/3.6/codetools’: File
> exists
> ERROR:   moving to final location failed
> 
> on some file systems.
> 
> # EXAMPLE
> 
> $ R --vanilla
> R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
> Copyright (C) 2019 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
> ...
> 
>> install.packages("codetools", repos="https://cran.r-project.org";)
> Installing package into 
> ‘/wynton/home/cbi/hb/R/x86_64-pc-linux-gnu-library/3.6’
> (as ‘lib’ is unspecified)
> trying URL 'https://cran.r-project.org/src/contrib/codetools_0.2-16.tar.gz'
> Content type 'application/x-gzip' length 12996 bytes (12 KB)
> ==
> downloaded 12 KB
> 
> * installing *source* package ‘codetools’ ...
> ** package ‘codetools’ successfully unpacked and MD5 sums checked
> ** using staged installation
> ** R
> ** byte-compile and prepare package for lazy loading
> ** help
> *** installing help indices
> ** building package indices
> ** testing if installed package can be loaded from temporary location
> mv: cannot move
> ‘/home/alice/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-codetools/00new/codetools’
> to ‘/home/alice/R/x86_64-pc-linux-gnu-library/3.6/codetools’: File
> exists
> ERROR:   moving to final location failed
> * removing ‘/home/alice/R/x86_64-pc-linux-gnu-library/3.6/codetools’
> 
> The downloaded source packages are in
> ‘/scratch/alice/Rtmp6UYDzu/downloaded_packages’
> Warning message:
> In install.packages("codetools", repos = "https://cran.r-project.org";) :
> installation of package ‘codetools’ had non-zero exit status
> 
> 
> # WORKAROUND
> 
> Disabling staged installation, for instance by setting environment
> variable 'R_INSTALL_STAGED=false' avoids this problem.
> 
> 
> # TROUBLESHOOTING
> 
> I think it comes down to the following call in src/library/tools/R/install.R:
> 
>   status <- system(paste("mv -f",
>  shQuote(instdir),
>  shQuote(dirname(final_instdir
> 
> https://github.com/wch/r-source/blob/d253331f578814f919f150ffdf1fe581618079a3/src/library/tools/R/install.R#L1645-L1647
> 
> which effectively does:
> 
> $ mkdir -p path/pkg  ## empty final destination placeholder(?)
> $ mkdir -p path/to/pkg
> $ mv -f path/to/pkg path
> 
> However, on one (and only one) of several systems I've tested, that
> 'mv' produce the error:
> 
>   mv: cannot move ‘path/to/pkg’ to ‘path/pkg’: File exists
> 
> This is on a BeeGFS parallel file system.  I cannot tell if that 'mv
> -f' should work or not, or if it is even well defined.  FWIW, the
> above 'mv' does indeed work if I switch to another folder that is
> mounted on a different, NFS, file system, i.e. it is not kernel/OS
> specific (here CentOS 7.6.1810).
> 
> If of any use, here's the 'strace' of the above 'mv':
> 
> $ strace mv -f path/to/pkg path
> execve("/usr/bin/mv", ["mv", "-f", "path/to/pkg", "path"], [/* 118 vars */]) 
> = 0
> brk(NULL)   = 0xcf3000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x7fde2ceb1000
> access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or 
> directory)
> open("/usr/lib64/openmpi/lib/tls/x86_64/libselinux.so.1",
> O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> stat("/usr/lib64/openmpi/lib/tls/x86_64", 0x7ffc6a3fb170) = -1 ENOENT
> (No such file or directory)
> open("/usr/lib64/openmpi/lib/tls/libselinux.so.1", O_RDONLY|O_CLOEXEC)
> = -1 ENOENT (No such file or directory)
> stat("/usr/lib64/openmpi/lib/tls", 0x7ffc6a3fb170) = -1 ENOENT (No
> such file or directory)
> open("/usr/lib64/openmpi/lib/x86_64/libselinux.so.1",
> O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> stat("/usr/lib64/openmpi/lib/x86_64", 0x7ffc6a3fb170) = -1 ENOENT (No
> such file or directory)
> open("/usr/lib64/openmpi/lib/libselinux.so.1", O_RDONLY|O_CLOEXEC) =
> -1 ENOENT (No such file or directory)
> stat("/usr/lib64/openmpi/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=96960, ...}) = 0
> mmap(NULL, 96960, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fde2ce99000
> close(3)= 0
> open("/lib64/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
> read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320i\0\0\0\0\0\0"...,
> 832) = 832
> fstat(3, {st_mode=S_IFREG|0755, st_size=155784, ...}) = 0
> mmap(NULL, 2255184, PROT_READ|

Re: [Rd] R optim(method="L-BFGS-B"): unexpected behavior when working with parent environments

2019-05-06 Thread Ben Bolker


  That's consistent/not surprising if the problem lies in the numerical
gradient calculation step ...

On 2019-05-06 10:06 a.m., Ravi Varadhan wrote:
> Optim's Nelder-Mead works correctly for this example.
> 
> 
>> optim(par=10, fn=fn, method="Nelder-Mead")
> x=10, ret=100.02 (memory)
> x=11, ret=121 (calculate)
> x=9, ret=81 (calculate)
> x=8, ret=64 (calculate)
> x=6, ret=36 (calculate)
> x=4, ret=16 (calculate)
> x=0, ret=0 (calculate)
> x=-4, ret=16 (calculate)
> x=-4, ret=16 (memory)
> x=2, ret=4 (calculate)
> x=-2, ret=4 (calculate)
> x=1, ret=1 (calculate)
> x=-1, ret=1 (calculate)
> x=0.5, ret=0.25 (calculate)
> x=-0.5, ret=0.25 (calculate)
> x=0.25, ret=0.0625 (calculate)
> x=-0.25, ret=0.0625 (calculate)
> x=0.125, ret=0.015625 (calculate)
> x=-0.125, ret=0.015625 (calculate)
> x=0.0625, ret=0.00390625 (calculate)
> x=-0.0625, ret=0.00390625 (calculate)
> x=0.03125, ret=0.0009765625 (calculate)
> x=-0.03125, ret=0.0009765625 (calculate)
> x=0.015625, ret=0.0002441406 (calculate)
> x=-0.015625, ret=0.0002441406 (calculate)
> x=0.0078125, ret=6.103516e-05 (calculate)
> x=-0.0078125, ret=6.103516e-05 (calculate)
> x=0.00390625, ret=1.525879e-05 (calculate)
> x=-0.00390625, ret=1.525879e-05 (calculate)
> x=0.001953125, ret=3.814697e-06 (calculate)
> x=-0.001953125, ret=3.814697e-06 (calculate)
> x=0.0009765625, ret=9.536743e-07 (calculate)
> $par
> [1] 0
> 
> $value
> [1] 0
> 
> $counts
> function gradient
>   32   NA
> 
> $convergence
> [1] 0
> 
> $message
> NULL
> 
> 
> 
> 
> 
> From: R-devel  on behalf of Duncan Murdoch 
> 
> Sent: Friday, May 3, 2019 8:18:44 AM
> To: peter dalgaard
> Cc: Florian Gerber; r-devel@r-project.org
> Subject: Re: [Rd] R optim(method="L-BFGS-B"): unexpected behavior when 
> working with parent environments
> 
> 
> It looks as though this happens when calculating numerical gradients:  x
> is reduced by eps, and fn is called; then x is increased by eps, and fn
> is called again.  No check is made that x has other references after the
> first call to fn.
> 
> I'll put together a patch if nobody else gets there first...
> 
> Duncan Murdoch
> 
> On 03/05/2019 7:13 a.m., peter dalgaard wrote:
>> Yes, I think you are right. I was at first confused by the fact that after 
>> the optim() call,
>>
>>> environment(fn)$xx
>> [1] 10
>>> environment(fn)$ret
>> [1] 100.02
>>
>> so not 9.999, but this could come from x being assigned the final value 
>> without calling fn.
>>
>> -pd
>>
>>
>>> On 3 May 2019, at 11:58 , Duncan Murdoch  wrote:
>>>
>>> Your results below make it look like a bug in optim():  it is not 
>>> duplicating a value when it should, so changes to x affect xx as well.
>>>
>>> Duncan Murdoch
>>>
>>> On 03/05/2019 4:41 a.m., Serguei Sokol wrote:
 On 03/05/2019 10:31, Serguei Sokol wrote:
> On 02/05/2019 21:35, Florian Gerber wrote:
>> Dear all,
>>
>> when using optim() for a function that uses the parent environment, I
>> see the following unexpected behavior:
>>
>> makeFn <- function(){
>>   xx <- ret <- NA
>>   fn <- function(x){
>>  if(!is.na(xx) && x==xx){
>>  cat("x=", xx, ", ret=", ret, " (memory)", fill=TRUE, sep="")
>>  return(ret)
>>  }
>>  xx <<- x; ret <<- sum(x^2)
>>  cat("x=", xx, ", ret=", ret, " (calculate)", fill=TRUE, sep="")
>>  ret
>>   }
>>   fn
>> }
>> fn <- makeFn()
>> optim(par=10, fn=fn, method="L-BFGS-B")
>> # x=10, ret=100 (calculate)
>> # x=10.001, ret=100.02 (calculate)
>> # x=9.999, ret=100.02 (memory)
>> # $par
>> # [1] 10
>> #
>> # $value
>> # [1] 100
>> # (...)
>>
>> I would expect that optim() does more than 3 function evaluations and
>> that the optimization converges to 0.
>>
>> Same problem with optim(par=10, fn=fn, method="BFGS").
>>
>> Any ideas?
> I don't have an answer but may be an insight. For some mysterious
> reason xx is getting changed when in should not. Consider:
>> fn=local({n=0; xx=ret=NA; function(x) {n <<- n+1; cat(n, "in
> x,xx,ret=", x, xx, ret, "\n"); if (!is.na(xx) && x==xx) ret else {xx
> <<- x; ret <<- x**2; cat("out x,xx,ret=", x, xx, ret, "\n"); ret}}})
>> optim(par=10, fn=fn, method="L-BFGS-B")
> 1 in x,xx,ret= 10 NA NA
> out x,xx,ret= 10 10 100
> 2 in x,xx,ret= 10.001 10 100
> out x,xx,ret= 10.001 10.001 100.02
> 3 in x,xx,ret= 9.999 9.999 100.02
> $par
> [1] 10
>
> $value
> [1] 100
>
> $counts
> function gradient
> 11
>
> $convergence
> [1] 0
>
> $message
> [1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
>
> At the third call, xx has value 9.999 while it should have kept the
> value 10.001.
>
 A little follow-up: if you untie the link between xx and x by replacing
 the expression "xx <<- x"

Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z

2019-06-23 Thread Ben Bolker


  I agree with many the sentiments about the wisdom of computing very
small p-values (although the example below may win some kind of a prize:
I've seen people talking about p-values of the order of 10^(-2000), but
never 10^(-(10^8)) !).  That said, there are a several tricks for
getting more reasonable sums of very small probabilities.  The first is
to scale the p-values by dividing the *largest* of the probabilities,
then do the (p/sum(p)) computation, then multiply the result (I'm sure
this is described/documented somewhere).  More generally, there are
methods for computing sums on the log scale, e.g.

https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.misc.logsumexp.html

 I don't know where this has been implemented in the R ecosystem, but
this sort of computation is the basis of the "Brobdingnag" package for
operating on very large ("Brobdingnagian") and very small
("Lilliputian") numbers.


On 2019-06-21 6:58 p.m., jing hua zhao wrote:
> Hi Peter, Rui, Chrstophe and Gabriel,
> 
> Thanks for your inputs --  the use of qnorm(., log=TRUE) is a good point in 
> line with pnorm with which we devised log(p)  as
> 
> log(2) + pnorm(-abs(z), lower.tail = TRUE, log.p = TRUE)
> 
> that could do really really well for large z compared to Rmpfr. Maybe I am 
> asking too much since
> 
> z <-2
>> Rmpfr::format(2*pnorm(mpfr(-abs(z),100),lower.tail=TRUE,log.p=FALSE))
> [1] "1.660579603192917090365313727164e-86858901"
> 
> already gives a rarely seen small p value. I gather I also need a multiple 
> precision exp() and their sum since exp(z^2/2) is also a Bayes Factor so I  
> get log(x_i )/sum_i log(x_i) instead. To this point, I am obliged to clarify, 
> see 
> https://statgen.github.io/gwas-credible-sets/method/locuszoom-credible-sets.pdf.
> 
> I agree many feel geneticists go to far with small p values which I would 
> have difficulty to argue againston the other hand it is also expected to see 
> these in a non-genetic context. For instance the Framingham study was 
> established in 1948 just got $34m for six years on phenotypewide association 
> which we would be interesting to see.
> 
> Best wishes,
> 
> 
> Jing Hua
> 
> 
> 
> From: peter dalgaard 
> Sent: 21 June 2019 16:24
> To: jing hua zhao
> Cc: Rui Barradas; r-devel@r-project.org
> Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z
> 
> You may want to look into using the log option to qnorm
> 
> e.g., in round figures:
> 
>> log(1e-300)
> [1] -690.7755
>> qnorm(-691, log=TRUE)
> [1] -37.05315
>> exp(37^2/2)
> [1] 1.881797e+297
>> exp(-37^2/2)
> [1] 5.314068e-298
> 
> Notice that floating point representation cuts out at 1e+/-308 or so. If you 
> want to go outside that range, you may need explicit manipulation of the log 
> values. qnorm() itself seems quite happy with much smaller values:
> 
>> qnorm(-5000, log=TRUE)
> [1] -99.94475
> 
> -pd
> 
>> On 21 Jun 2019, at 17:11 , jing hua zhao  wrote:
>>
>> Dear Rui,
>>
>> Thanks for your quick reply -- this allows me to see the bottom of this. I 
>> was hoping we could have a handle of those p in genmoics such as 1e-300 or 
>> smaller.
>>
>> Best wishes,
>>
>>
>> Jing Hua
>>
>> 
>> From: Rui Barradas 
>> Sent: 21 June 2019 15:03
>> To: jing hua zhao; r-devel@r-project.org
>> Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z
>>
>> Hello,
>>
>> Well, try it:
>>
>> p <- .Machine$double.eps^seq(0.5, 1, by = 0.05)
>> z <- qnorm(p/2)
>>
>> pnorm(z)
>> # [1] 7.450581e-09 1.22e-09 2.026908e-10 3.343152e-11 5.514145e-12
>> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16
>> #[11] 1.110223e-16
>> p/2
>> # [1] 7.450581e-09 1.22e-09 2.026908e-10 3.343152e-11 5.514145e-12
>> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16
>> #[11] 1.110223e-16
>>
>> exp(z*z/2)
>> # [1] 9.184907e+06 5.301421e+07 3.073154e+08 1.787931e+09 1.043417e+10
>> # [6] 6.105491e+10 3.580873e+11 2.104460e+12 1.239008e+13 7.306423e+13
>> #[11] 4.314798e+14
>>
>>
>> p is the smallest possible such that 1 + p != 1 and I couldn't find
>> anything to worry about.
>>
>>
>> R version 3.6.0 (2019-04-26)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 19.04
>>
>> Matrix products: default
>> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
>> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
>>
>> locale:
>>  [1] LC_CTYPE=pt_PT.UTF-8   LC_NUMERIC=C
>>  [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8
>>  [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8
>>  [7] LC_PAPER=pt_PT.UTF-8   LC_NAME=C
>>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods
>> [7] base
>>
>> other attached packages:
>>
>> [many packages loaded]
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> �s 15:24 de 21/06/19, jing h

[Rd] trivial typos in man/switch.Rd

2019-07-02 Thread Ben Bolker

  My colleague points out that these typos are probably still present
because almost no-one has the stamina to read that far down in ?switch ...

  cheers
Ben Bolker
Index: switch.Rd
===
--- switch.Rd   (revision 76766)
+++ switch.Rd   (working copy)
@@ -39,7 +39,7 @@
   in which case the next non-missing element is evaluated, so for
   example \code{switch("cc", a = 1, cc =, cd =, d = 2)} evaluates to
   \code{2}.  If there is more than one match, the first matching element
-  is used.  In the case of no match, if there is a unnamed element of
+  is used.  In the case of no match, if there is an unnamed element of
   \code{\dots} its value is returned.  (If there is more than one such
   argument an error is signaled.)
 
@@ -46,7 +46,7 @@
   The first argument is always taken to be \code{EXPR}: if it is named
   its name must (partially) match.
 
-  A warning is signaled if no alternatives are provides, as this is
+  A warning is signaled if no alternatives are provided, as this is
   usually a coding error.
   
   This is implemented as a \link{primitive} function that only evaluates
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: quiet namespace load is noisy

2019-07-23 Thread Ben Bolker


  Does setting message=FALSE in the chunk options of the vignette help?

  Or less preferably, using supressMessages() ?

On 2019-07-23 9:36 a.m., Lenth, Russell V wrote:
> Lionel,
> 
> Thanks for your response. I understand that method overriding can be a 
> serious issue, but as you say, this is not something that the user can act 
> upon. Yet the message lands at the user’s feet. 
> 
> In my case, the messages are cluttering my package vignettes, and may or may 
> not represent what users see if they themselves run the vignette code, 
> depending on what version of ggplot2, etc. they have. I will certainly update 
> my ggplot2 installation and that will help. But basically I don’t ever want 
> these kinds of messages to appear in my vignettes, so I will seek some other 
> workaround. 
> 
> Russ
> 
> Sent from my iPhone
> 
>> On Jul 23, 2019, at 1:32 AM, Lionel Henry  wrote:
>>
>> Hello,
>>
>> I think `quietly` should only silence normal masking messages
>> intended for users and providing information about normal
>> behaviour, such as masking.  This is not the case here as the
>> message is about overriding of S3 methods, which has global
>> effect and is rather problematic. It may change behaviour of
>> package and script code in unpredictable ways.
>>
>> This is not something that the user can act upon, the developers
>> of the parties involved need to be contacted by users so they can
>> fix it (the developers of the conflicting methods might not be
>> aware if the generic is from a third party package, such as
>> base::print()). In the case of ggplot2 vs rlang, you can update
>> ggplot2 to the latest version to fix these messages.
>>
>>> After all, other package startup messages ARE suppressed, and
>>> even error messages are suppressed
>>
>> Note that `quietly = TRUE` does not really suppress error
>> messages for missing packages. The errors are converted to a
>> boolean return value, and thus become normal behaviour, for which
>> it makes sense to suppress the message. This does not imply the
>> S3 overriding message should be suppressed as well.
>>
>> Best,
>> Lionel
>>
>>
>>> On 23 Jul 2019, at 06:29, Lenth, Russell V  wrote:
>>>
>>> Dear R-devel,
>>>
>>> Consider the following clip (in R version 3.6.0, Windows):
>>>
 requireNamespace("ggplot2", quietly = TRUE)
>>>   Registered S3 methods overwritten by 'ggplot2':
>>> method from 
>>> [.quosures rlang
>>> c.quosures rlang
>>> print.quosures rlang
>>>
>>> It seems to me that if one specifies 'quietly = TRUE', then messages about 
>>> S3 method overrides should be quieted along with everything else. After 
>>> all, other package startup messages ARE suppressed, and even error messages 
>>> are suppressed:
>>>
 requireNamespace("xyz", quietly = TRUE)
 ## (it is silent even though there is no "xyz" package)
>>>
>>> Thanks
>>>
>>> Russ Lenth
>>> U of Iowa
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Underscores in package names

2019-08-09 Thread Ben Bolker


  Creeping code complexity ...

  I like to think that the cuteR names will have a Darwinian
disadvantage in the long run. FWIW Hadley Wickham argues (rightly, I
think) against mixed-case names:
http://r-pkgs.had.co.nz/package.html#naming. I too am guilty of picking
mixed-case package names in the past.  Extra credit if the package name
and the standard function have different cases! e.g.
glmmADMB::glmmadmb(), although (a) that wasn't my choice and (b) at
least it was never on CRAN and (c) it wasn't one of the cuteR variety.

  Bonus points for the first analysis of case conventions in existing
CRAN package names ... I'll start.

> a1 <- rownames(available.packages())
> cute <- "[a-z]*R[a-z]*"
> table(grepl(cute,a1))

FALSE  TRUE
12565  2185


On 2019-08-09 2:00 p.m., neonira Arinoem wrote:
> Won't it be better to have a convention that allows lowercase, dash,
> underscore and dot as only valid characters for new package names and keep
> the ancient format validation scheme for older package names?
> 
> This could be implemented by a single function, taking a strictNaming_b_1
> parameter which defaults to true. Easy to use, and compliance results will
> vary according to the parameter value, allowing strict compliance for new
> package names and lazy compliance for older ones.
> 
> Doing so allows to enforce a new package name convention while also
> insuring continuity of compliance for already existing package names.
> 
> Fabien GELINEAU alias Neonira
> 
> Le ven. 9 août 2019 à 18:40, Kevin Wright  a écrit :
> 
>> Please, no.  I'd also like to disallow uppercase letters in package names.
>> For instance, the cuteness of using a capital "R" in package names is
>> outweighed by the annoyance of trying to remember which packages use an
>> upper-case letter.
>>
>> On Thu, Aug 8, 2019 at 9:32 AM Jim Hester 
>> wrote:
>>
>>> Are there technical reasons that package names cannot be snake case?
>>> This seems to be enforced by `.standard_regexps()$valid_package_name`
>>> which currently returns
>>>
>>>"[[:alpha:]][[:alnum:].]*[[:alnum:]]"
>>>
>>> Is there any technical reason this couldn't be altered to accept `_`
>>> as well, e.g.
>>>
>>>   "[[:alpha:]][[:alnum:]._]*[[:alnum:]]"
>>>
>>> I realize that historically `_` has not always been valid in variable
>>> names, but this has now been acceptable for 15+ years (since R 1.9.0 I
>>> believe). Might we also allow underscores for package names?
>>>
>>> Jim
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>
>> --
>> Kevin Wright
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Underscores in package names

2019-08-09 Thread Ben Bolker


 Ugh, but not *as* ambiguous as the proposed example (you can still
split unambiguously on "_"; yes, you could split on "last _" in
Gabriel's example, but ...)

On 2019-08-09 4:17 p.m., Duncan Murdoch wrote:
> On 09/08/2019 2:41 p.m., Gabriel Becker wrote:
>> Note that this proposal would make mypackage_2.3.1 a valid *package
>> name*,
>> whose corresponding tarball name might be mypackage_2.3.1_2.3.2 after a
>> patch. Yes its a silly example, but why allow that kind of ambiguity?
>>
> CRAN already has a package named "FuzzyNumbers.Ext.2", whose tarball is
> FuzzyNumbers.Ext.2_3.2.tar.gz, so I think we've already lost that game.
> 
> Duncan Murdoch
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] depending on orphaned packages?

2019-09-24 Thread Ben Bolker
SuppDists is orphaned on CRAN (and has been since 2013).

https://cran.r-project.org/web/checks/check_results_.html

 Oddly, the simulate method for the inverse.gaussian family
[inverse.gaussian()$simulate] depends (in a loose sense) on SuppDists
(it fails if the SuppDists namespace is not available:

if (!requireNamespace("SuppDists", quietly = TRUE))
stop("need CRAN package 'SuppDists' for simulation from the
'inverse.gaussian' family")


  The statmod package also implements inverse gaussian d/p/q/r functions
<https://journal.r-project.org/archive/2016-1/giner-smyth.pdf>.  It is
lightweight (depends on R >= 3.0.0, imports only base packages [stats
and graphics]) and has been around for a long time (archived versions on
CRAN go back to 2003).

  Would it make sense to replace the call to SuppDists::rinvGauss with a
corresponding call to statmod::rinvgauss ?  Would a patch be considered?

  Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] depending on orphaned packages?

2019-09-29 Thread Ben Bolker



On 2019-09-25 3:26 a.m., Martin Maechler wrote:
>>>>>> Ben Bolker 
>>>>>> on Tue, 24 Sep 2019 20:09:55 -0400 writes:
> 
> > SuppDists is orphaned on CRAN (and has been since 2013).
> > https://cran.r-project.org/web/checks/check_results_.html
> 
> > Oddly, the simulate method for the inverse.gaussian family
> > [inverse.gaussian()$simulate] depends (in a loose sense) on SuppDists
> > (it fails if the SuppDists namespace is not available:
> 
> > if (!requireNamespace("SuppDists", quietly = TRUE))
> > stop("need CRAN package 'SuppDists' for simulation from the
> > 'inverse.gaussian' family")
> 
> 
> > The statmod package also implements inverse gaussian d/p/q/r functions
> > <https://journal.r-project.org/archive/2016-1/giner-smyth.pdf>.  It is
> > lightweight (depends on R >= 3.0.0, imports only base packages [stats
> > and graphics]) and has been around for a long time (archived versions on
> > CRAN go back to 2003).
> 
> > Would it make sense to replace the call to SuppDists::rinvGauss with a
> > corresponding call to statmod::rinvgauss ?  Would a patch be considered?
> 
> > Ben Bolker
> 
> I'd say "yes" & "yes".
> 
> "Base" code weekly depending on CRAN packages (apart from
> formally 'Recommended' ones)  is somewhat sub-optimal in any
> case, ((but possibly still the best thing, given reality
> [maintenance efforts, copyrights, ...])),
> but your proposal seems a  "uniformly not worse"  change
> ((and I have very much liked delving into parts of Gordon
>   Smyth's textbook on GLMs as a really nice mixture / in-between
>   of rigorous math and applied stats))

   I did actually think of a reason *not* to do this.

   The resulting random deviates generated by statmod::rinvgauss aren't
exactly the same as those from SuppDists::rinvGauss (same algorithm, but
I guess they use sufficiently different internal machinery), so this
could break exact backward compatibility for any code that uses
simulate() for inverse-Gaussian models.  Still might be worth doing, but
now the change is *not* "uniformly not worse".

An alternative (which would remove the dependency on a CRAN package)
would be to pull the code of statmod::rinvgauss into R (which would be
allowed - statmod is GPL 2/3 - but of course it would be polite to ask).
The downside to this solution would be adding the maintenance burden of
this code ...

> 
> Martin Maechler
> ETH Zurich and R Core
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistencies in wilcox.test

2019-12-07 Thread Ben Bolker


 Your second issue seems like a more or less unavoidable floating-point
computation issue.  The paired test operates by computing differences
between corresponding values of x and y.

  It's not impossible to try to detect "almost-ties" (by testing for
differences less than, say, sqrt(.Machine$double.eps)), but it's
delicate and somewhat subjective/problem-dependent.

  Example:

options(digits=20)
> unique(c(4,3,2)-c(3,2,1))
[1] 1
> unique(c(0.4,0.3,0.2)-c(0.3,0.2,0.1))
[1] 0.100033307 0.099977796 0.15551

On 2019-12-07 1:55 p.m., Karolis Koncevičius wrote:
> Hello,
> 
> Writing to share some things I've found about wilcox.test() that seem a
> a bit inconsistent.
> 
> 1. Inf values are not removed if paired=TRUE
> 
> # returns different results (Inf is removed):
> wilcox.test(c(1,2,3,4), c(0,9,8,7))
> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
> 
> # returns the same result (Inf is left as value with highest rank):
> wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
> wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
> 
> 2. tolerance issues with paired=TRUE.
> 
> wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
> # ...
> # Warning:  cannot compute exact p-value with ties
> 
> wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
> # ...
> # no warning
> 
> 3. Always 'x observations are missing' when paired=TRUE
> 
> wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
> # ...
> # Error:  not enough (finite) 'x' observations
> 
> 4. No indication if normal approximation was used:
> 
> # different numbers, but same "method" name
> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
> 
> 
> From all of these I am pretty sure the 1st one is likely unintended,
> so attaching a small patch to adjust it. Can also try patching others if
> consensus is reached that the behavioiur has to be modified.
> 
> Kind regards,
> Karolis Koncevičius.
> 
> ---
> 
> Index: wilcox.test.R
> ===
> --- wilcox.test.R  (revision 77540)
> +++ wilcox.test.R  (working copy)
> @@ -42,7 +42,7 @@
>  if(paired) {
>  if(length(x) != length(y))
>  stop("'x' and 'y' must have the same length")
> -    OK <- complete.cases(x, y)
> +    OK <- is.finite(x) & is.finite(y)
>  x <- x[OK] - y[OK]
>  y <- NULL
>  }
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as-cran issue

2020-01-13 Thread Ben Bolker
  From R NEWS (changes in 3.6.0)

Experimentally, setting environment variable _R_CHECK_LENGTH_1_LOGIC2_
will lead to warnings (or errors if the variable is set to a ‘true’
value) when && or || encounter and use arguments of length more than one.

On 2020-01-13 11:46 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:
> Thanks for the feedback Dirk.   I sent my follow-up before I saw it.
> 
> Looking at the source code, it appears that there is no options() call
> to turn this on. Nor does "R --help" reveal a command line option.
> How then does a user turn this on outside of the R CMD check
> envirionment, so as to chase things like this down?
> 
> The fact that 1. renaming my function makes the error go away, 2. my
> function is just a wrapper to inherits(), and 3. its a new error in code
> that hasn't changed, all point me towards some oddity with the check
> function.
> 
> Terry
> 
> 
> On 1/13/20 10:22 AM, Dirk Eddelbuettel wrote:
>>
>> On 13 January 2020 at 10:02, Therneau, Terry M., Ph.D. via R-devel wrote:
>> | Where can I find out (and replicate) what options as-cran turns on?
>>
>> See the file src/library/tools/R/check.R in the R sources, and grep for
>> as_cran which is the internal variable controlled by the --as-cran option
>>
>> [...]
>>
>> | The check log contains multiple instances of the lines below:
>> |
>> | < Warning message:
>> | < In if (ismat(kmat)) { :
>> | <   the condition has length > 1 and only the first element will be
>> used
>> |
>> | I don't see how the error could arise, but if I know what as-cran is
>> doing perhaps I can
>> | replicate it.
>>
>> This was widely discussed on this list and should also be in the NEWS
>> file.
>>
>> The change is about what the message says: the if () tests a scalar
>> logical,
>> it appears that ismat(kmat) returns more than a scalar.
>>
>> There has always been an opt-in for this to error -- cf many messages
>> by Henrik
>> over the years as he tried to convince us all to use it more.
>>
>>
>> Dirk
>>
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-20 Thread Ben Bolker


 Ugh, sounds like competing priorities.

  * maintain type consistency
  * minimize storage (= current version, since 3.0.0)
  * maximize utility for large lambda (= proposed change)
  * keep user interface, and code, simple (e.g., it would be easy enough
to add a switch that provided user control of int vs double return value)
  * backward compatibility



On 2020-01-20 12:33 p.m., Martin Maechler wrote:
>> Benjamin Tyner 
>> on Mon, 20 Jan 2020 08:10:49 -0500 writes:
> 
> > On 1/20/20 4:26 AM, Martin Maechler wrote:
> >> Coming late here -- after enjoying a proper weekend ;-) --
> >> I have been agreeing (with Spencer, IIUC) on this for a long
> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
> >> "design bug" that  rpois() {and similar} must return return typeof() 
> "integer".
> >> 
> >> More strongly, I'm actually pretty convinced they should return
> >> (integer-valued) double instead of NA_integer_   and for that
> >> reason should always return double:
> >> Even if we have (hopefully) a native 64bit integer in R,
> >> 2^64 is still teeny tiny compared .Machine$double.max
> >> 
> >> (and then maybe we'd have .Machine$longdouble.max  which would
> >> be considerably larger than double.max unless on Windows, where
> >> the wise men at Microsoft decided to keep their workload simple
> >> by defining "long double := double" - as 'long double'
> >> unfortunately is not well defined by C standards)
> >> 
> >> Martin
> >> 
> > Martin if you are in favor, then certainly no objection from me! ;-)
> 
> > So now what about other discrete distributions e.g. could a similar 
> > enhancement apply here?
> 
> 
> >> rgeom(10L, 1e-10)
> >  [1] NA 1503061294 NA NA 1122447583 NA
> >  [7] NA NA NA NA
> > Warning message:
> > In rgeom(10L, 1e-10) : NAs produced
> 
> yes, of course there are several such distributions.
> 
> It's really something that should be discussed (possibly not
> here, .. but then I've started it here ...).
> 
> The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :
> 
> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
>   rsignrank() and rwilcox() now return integer (not double)
>   vectors.  This halves the storage requirements for large
>   simulations.
> 
> and what I've been suggesting is to revert this change
> (svn rev r60225-6) which was purposefully and diligently done by
> a fellow R core member, so indeed must be debatable. 
> 
> Martin
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 3.6.3 scheduled for February 29

2020-02-06 Thread Ben Bolker
  Because it's the fifth recurrence of the date (29 February).
  https://en.wikipedia.org/wiki/The_Pirates_of_Penzance

On Thu, Feb 6, 2020 at 3:32 PM Abby Spurdle  wrote:
>
> Congratulations!
>
> > celebrate (beeR=TRUE, loud.music=FALSE,
> nbeeRs=2L,
> proportion.of.tech.talk=0.4)
>
> Why is it the 5th anniversary and the not the 20th anniversary?
>
>
> On Fri, Feb 7, 2020 at 4:58 AM Peter Dalgaard via R-devel
>  wrote:
> >
> > Full schedule is available on developer.r-project.org.
> >
> > (The date is chosen to celebrate the 5th anniversary of R 1.0.0. Some 
> > irregularity may occur on the release day, since this happens to be a 
> > Saturday and the release manager is speaking at the CelebRation2020 
> > event...)
> >
> > --
> > Peter Dalgaard, Professor,
> > Center for Statistics, Copenhagen Business School
> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Office: A 4.23
> > Email: pd@cbs.dk  Priv: pda...@gmail.com
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trivial typo in man page Quote.Rd

2020-02-21 Thread Ben Bolker


  Attn: someone on R-core:

  "ran" should be "can".

  Also, thanks for this feature!

Index: Quotes.Rd
===
--- Quotes.Rd   (revision 77845)
+++ Quotes.Rd   (working copy)
@@ -74,7 +74,7 @@
   Raw character constants are also available using a syntax similar to
   the one used in C++: \code{r"(...)"} with \code{...} any character
   sequence, except that it must not contain the closing sequence
-  \samp{)"}. The delimiter pairs \code{[]} and \code{\{\}} ran also be
+  \samp{)"}. The delimiter pairs \code{[]} and \code{\{\}} can also be
   used. For  additional flexibility, a number of dashes can be placed
   between the opening quote and the opening delimiter, as long as the same
   number of dashes appear between the closing delimiter and the closing
quote.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] specials issue, a heads up

2020-02-24 Thread Ben Bolker
In the long run, coming up with a way to parse specials in formulas
that is both clean and robust is a good idea - annoying users are a
little bit like CRAN maintainers in this respect. I think I would
probably do this by testing identical(eval(extracted_head),
survival::Surv) - but this has lots of potential annoyances (what if
extracted_head is a symbol that can't be found in any attached
environment?  Do we have to start with if
(length(find(deparse(extracted_head))>0) ?

In the short run, a clear note in the documentation seems entirely sufficient.

On Mon, Feb 24, 2020 at 12:01 PM Hugh Parsonage
 wrote:
>
> I mean if the person filing the bug regards style as more important than
> the truth of how R treats formulas then they’re literally talking in
> another language.
>
> I strongly recommend you do nothing or at most make a note in the
> documentation addressing this. Your time is too valuable.
>
> On Tue, 25 Feb 2020 at 12:56 am, Therneau, Terry M., Ph.D. via R-devel <
> r-devel@r-project.org> wrote:
>
> > I recently had a long argument wrt the survival package, namely that the
> > following code
> > didn't do what they expected, and so they reported it as a bug
> >
> >survival::coxph( survival::Surv(time, status) ~ age + sex +
> > survival::strata(inst),
> > data=lung)
> >
> > a. The Google R style guide  recommends that one put :: everywhere
> > b. This breaks the recognition of cluster as a "special" in the terms
> > function.
> >
> > I've been stubborn and said that their misunderstanding of how formulas
> > work is not my
> > problem.   But I'm sure that the issue will come up again, and multiple
> > other packages
> > will break.
> >
> > A big problem is that the code runs, it just gives the wrong answer.
> >
> > Suggestions?
> >
> > Terry T.
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dput()

2020-02-29 Thread Ben Bolker


 I think Robin knows about FAQ 7.31/floating point (author of
'Brobdingnag', among other numerical packages).  I agree that this is
surprising (to me).

  To reframe this question: is there way to get an *exact* ASCII
representation of a numeric value (i.e., guaranteeing the restored value
is identical() to the original) ?

 .deparseOpts has

‘"digits17"’: Real and finite complex numbers are output using
  format ‘"%.17g"’ which may give more precision than the
  default (but the output will depend on the platform and there
  may be loss of precision when read back).

  ... but this still doesn't guarantee that all precision is kept.

  Maybe

 saveRDS(x,textConnection("out","w"),ascii=TRUE)
identical(x,as.numeric(out[length(out)]))   ## TRUE

?




On 2020-02-29 2:42 a.m., Rui Barradas wrote:
> Hello,
> 
> FAQ 7.31
> 
> See also this StackOverflow post:
> 
> https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal
> 
> Hope this helps,
> 
> Rui Barradas
> 
> Às 00:08 de 29/02/20, robin hankin escreveu:
>> My interpretation of dput.Rd is that dput() gives an exact ASCII form
>> of the internal representation of an R object.  But:
>>
>>   rhankin@cuttlefish:~ $ R --version
>> R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night"
>> Copyright (C) 2019 The R Foundation for Statistical Computing
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> [snip]
>>
>> rhankin@cuttlefish:~ $ R --vanilla --quiet
>>> x <- sum(dbinom(0:20,20,0.35))
>>> dput(x)
>> 1
>>> x-1
>> [1] -4.440892e-16
>>>
>>> x==1
>> [1] FALSE
>>>
>>
>> So, dput(x) gives 1, but x is not equal to 1.  Can anyone advise?
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] survival bug?

2020-03-03 Thread Ben Bolker


  Microsoft offers fully-provisioned but time-limited developer images
for Windows 10 (I think they last for 3 months) for most major VM
platforms (including VirtualBox, which is the one I currently use).
There would certainly be a start-up cost in effort, but probably not any
financial cost.

   cheers
Ben Bolker

On 2020-03-03 4:02 p.m., Gabriel Becker wrote:
> Hi Terry,
> 
> http://win-builder.r-project.org/ and the rhub build service (which can be
> invoked by the rhub package) allow on demand checks in windows
> environments, though for active debugging the iteration time can be quite
> painful.
> 
> If you have access, e.g., through your employer, to a windows license you
> should also be able to do use VMWare or VirtualBox (I can never remember
> which one I like more) to run windows and test that way. This will have
> some start up cost in effort but allows active testing and iteration.
> 
> Hope that helps,
> ~G
> 
> On Tue, Mar 3, 2020 at 7:00 AM Therneau, Terry M., Ph.D. via R-devel <
> r-devel@r-project.org> wrote:
> 
>> My latest submission of survival3.1-10 to CRAN fails  a check, but only on
>> windows, which
>> I don't use.
>> How do I track this down?
>> The test in question works fine on my Linux box.
>>
>> Terry
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] help with rchk warnings on Rf_eval(Rf_lang2(...))

2020-03-23 Thread Ben Bolker
Dear r-devel folks,

  [if this is more appropriate for r-pkg-devel please let me know and
I'll repost it over there ...]

I'm writing to ask for help with some R/C++ integration idioms that are
used in a package I'm maintaining, that are unfamilar to me, and that
are now being flagged as problematic by Tomas Kalibera's 'rchk'
machinery (https://github.com/kalibera/rchk); results are here
https://raw.githubusercontent.com/kalibera/cran-checks/master/rchk/results/lme4.out

The problem is with constructions like

::Rf_eval(::Rf_lang2(fun, arg), d_rho)

I *think* this means "construct a two-element pairlist from fun and arg,
then evaluate it within expression d_rho"

This leads to warnings like

"calling allocating function Rf_eval with argument allocated using Rf_lang2"

Is this a false positive or ... ? Can anyone help interpret this?

Not sure why this idiom was used in the first place: speed? (e.g., see
https://stat.ethz.ch/pipermail/r-devel/2019-June/078020.html ) Should I
be rewriting to avoid Rf_eval entirely in favor of using a Function?
(i.e., as commented in
https://stackoverflow.com/questions/37845012/rcpp-function-slower-than-rf-eval
: "Also, calling Rf_eval() directly from a C++ context is dangerous as R
errors (ie, C longjmps) will bypass the destructors of C++ objects and
leak memory / cause undefined behavior in general. Rcpp::Function tries
to make sure that doesn't happen.")

 Any tips, corrections, pointers to further documentation, etc. would be
most welcome ... Web searching for this stuff hasn't gotten me very far,
and it seems to be deeper than most of the introductory material I can
find (including the Rcpp vignettes) ...

  cheers
   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] help with rchk warnings on Rf_eval(Rf_lang2(...))

2020-03-23 Thread Ben Bolker


 Thanks, that's really useful.  One more question for you, or someone
else here:

const ArrayXd glmLink::linkFun(const ArrayXd& mu) const {
return as(::Rf_eval(::Rf_lang2(as(d_linkFun),

as(Rcpp::NumericVector(mu.data(),

 mu.data() + mu.size()))
), d_rho);
}


I guess I need that to read
PROTECT(::Rf_eval(PROTECT(::Rf_lang2(...),...) , but as written it
doesn't seem I have anywhere to squeeze in an UNPROTECT(2).  Do I need
to define a temporary variable so I can UNPROTECT(2) before I return the
value?

Or is there a way I can use Shield() since this an Rcpp-based project
anyway?

  Sorry for all the very basic questions, but I'm flying nearly blind
here ...

  cheers
   Ben Bolker



On 2020-03-23 4:01 p.m., Tomas Kalibera wrote:
> On 3/23/20 8:39 PM, Ben Bolker wrote:
>> Dear r-devel folks,
>>
>>    [if this is more appropriate for r-pkg-devel please let me know and
>> I'll repost it over there ...]
>>
>> I'm writing to ask for help with some R/C++ integration idioms that are
>> used in a package I'm maintaining, that are unfamilar to me, and that
>> are now being flagged as problematic by Tomas Kalibera's 'rchk'
>> machinery (https://github.com/kalibera/rchk); results are here
>> https://raw.githubusercontent.com/kalibera/cran-checks/master/rchk/results/lme4.out
>>
>>
>> The problem is with constructions like
>>
>> ::Rf_eval(::Rf_lang2(fun, arg), d_rho)
>>
>> I *think* this means "construct a two-element pairlist from fun and arg,
>> then evaluate it within expression d_rho"
>>
>> This leads to warnings like
>>
>> "calling allocating function Rf_eval with argument allocated using
>> Rf_lang2"
>>
>> Is this a false positive or ... ? Can anyone help interpret this?
> This is a true error. You need to protect the argument of eval() before
> calling eval, otherwise eval() could destroy it before using it. This is
> a common rule: whenever passing an argument to a function, that argument
> must be protected (directly or indirectly). Rchk tries to be smart and
> doesn't report a warning when it can be sure that in that particular
> case, for that particular function, it is safe. This is easy to fix,
> just protect the result of lang2() before the call and unprotect (some
> time) after.
>> Not sure why this idiom was used in the first place: speed? (e.g., see
>> https://stat.ethz.ch/pipermail/r-devel/2019-June/078020.html ) Should I
>> be rewriting to avoid Rf_eval entirely in favor of using a Function?
>> (i.e., as commented in
>> https://stackoverflow.com/questions/37845012/rcpp-function-slower-than-rf-eval
>>
>> : "Also, calling Rf_eval() directly from a C++ context is dangerous as R
>> errors (ie, C longjmps) will bypass the destructors of C++ objects and
>> leak memory / cause undefined behavior in general. Rcpp::Function tries
>> to make sure that doesn't happen.")
> 
> Yes, eval (as well as lang2) can throw an error, this error has to be
> caught via R API and handled (e.g. by throwing as exception or something
> else, indeed that exception then needs to be caught and possibly
> converted back when leaving again to C stack frames). An R/C API you can
> use here is R_UnwindProtect. This is of course a bit of a pain, and one
> does not have to worry when programming in plain C.
> 
> I suppose Rcpp provides some wrapper around R_UnwindProtect, that would
> be a question for Rcpp experts/maintainers.
> 
> Best
> Tomas
> 
>>
>>   Any tips, corrections, pointers to further documentation, etc. would be
>> most welcome ... Web searching for this stuff hasn't gotten me very far,
>> and it seems to be deeper than most of the introductory material I can
>> find (including the Rcpp vignettes) ...
>>
>>    cheers
>>     Ben Bolker
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] unstable corner of parameter space for qbeta?

2020-03-25 Thread Ben Bolker


  I've discovered an infelicity (I guess) in qbeta(): it's not a bug,
since there's a clear warning about lack of convergence of the numerical
algorithm ("full precision may not have been achieved").  I can work
around this, but I'm curious why it happens and whether there's a better
workaround -- it doesn't seem to be in a particularly extreme corner of
parameter space. It happens, e.g., for  these parameters:

phi <- 1.1
i <- 0.01
t <- 0.001
shape1 = i/phi  ##  0.009090909
shape2 = (1-i)/phi  ## 0.9
qbeta(t,shape1,shape2)  ##  5.562685e-309
##  brute-force uniroot() version, see below
Qbeta0(t,shape1,shape2)  ## 0.9262824

  The qbeta code is pretty scary to read: the warning "full precision
may not have been achieved" is triggered here:

https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qbeta.c#L530

  Any thoughts?  Should I report this on the bug list?


A more general illustration:
http://www.math.mcmaster.ca/bolker/misc/qbeta.png

===
fun <- function(phi,i=0.01,t=0.001, f=qbeta) {
  f(t,shape1=i/phi,shape2=(1-i)/phi, lower.tail=FALSE)
}
## brute-force beta quantile function
Qbeta0 <- function(t,shape1,shape2,lower.tail=FALSE) {
  fn <- function(x) {pbeta(x,shape1,shape2,lower.tail=lower.tail)-t}
  uniroot(fn,interval=c(0,1))$root
}
Qbeta <- Vectorize(Qbeta0,c("t","shape1","shape2"))
curve(fun,from=1,to=4)
curve(fun(x,f=Qbeta),add=TRUE,col=2)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unstable corner of parameter space for qbeta?

2020-03-26 Thread Ben Bolker



On 2020-03-26 4:02 a.m., Martin Maechler wrote:
>>>>>> Ben Bolker 
>>>>>> on Wed, 25 Mar 2020 21:09:16 -0400 writes:
> 
> > I've discovered an infelicity (I guess) in qbeta(): it's not a bug,
> > since there's a clear warning about lack of convergence of the numerical
> > algorithm ("full precision may not have been achieved").  I can work
> > around this, but I'm curious why it happens and whether there's a better
> > workaround -- it doesn't seem to be in a particularly extreme corner of
> > parameter space. It happens, e.g., for  these parameters:
> 
> > phi <- 1.1
> > i <- 0.01
> > t <- 0.001
> > shape1 = i/phi  ##  0.009090909
> > shape2 = (1-i)/phi  ## 0.9
> > qbeta(t,shape1,shape2)  ##  5.562685e-309
> > ##  brute-force uniroot() version, see below
> > Qbeta0(t,shape1,shape2)  ## 0.9262824
> 
> > The qbeta code is pretty scary to read: the warning "full precision
> > may not have been achieved" is triggered here:
> 
> > 
> https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qbeta.c#L530
> 
> > Any thoughts?
> 
> Well,  qbeta() is mostly based on inverting pbeta()  and pbeta()
> has *several* "dangerous" corners in its parameter spaces
> {in some cases, it makes sense to look at the 4 different cases
>  log.p = TRUE/FALSE  //  lower.tail = TRUE/FALSE  separately ..}
> 
> pbeta() itself is based on the most complex numerical code in
> all of base R, i.e., src/nmath/toms708.c  and that algorithm
> (TOMS 708) had been sophisticated already when it was published,
> and it has been improved and tweaked several times since being
> part of R, notably for the log.p=TRUE case which had not been in
> the focus of the publication and its algorithm.
> [[ NB: part of this you can read when reading  help(pbeta)  to the end ! ]]
> 
> I've spent many "man weeks", or even "man months" on pbeta() and
> qbeta(), already and have dreamed to get a good student do a
> master's thesis about the problem and potential solutions I've
> looked into in the mean time.
> 
> My current gut feeling is that in some cases, new approximations
> are necessary (i.e. tweaking of current approximations is not
> going to help sufficiently).
> 
> Also not (in the R sources)  tests/p-qbeta-strict-tst.R
> a whole file of "regression tests" about  pbeta() and qbeta()
> {where part of the true values have been computed with my CRAN
> package Rmpfr (for high precision computation) with the
> Rmpfr::pbetaI() function which gives arbitrarily precise pbeta()
> values but only when  (a,b) are integers -- that's the "I" in pbetaI().
> 
> Yes, it's intriguing ... and I'll look into your special
> findings a bit later today.
> 
> 
>   > Should I report this on the bug list?
> 
> Yes, please.  Not all problem of pbeta() / qbeta() are part yet,
> of R's bugzilla data base,  and maybe this will help to draw
> more good applied mathematicians look into it.

  Will report.

  I'm not at all surprised that this is a super-tough problem.  The only
part that was surprising to me was that my naive uniroot-based solution
worked (for this particular corner of parameter space where qbeta() has
trouble: it was terrible elsewhere, so now I'm using a hybrid solution
where I use my brute-force uniroot thing if I get a warning from qbeta().

I hesitated to even bring it up because I know you're really busy, but I
figured it was better to tag it now and let you deal with it some time
later.

Bugzilla report at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17746

  cheers
   Ben Bolker


> 
> 
> 
> Martin Maechler
> ETH Zurich and R Core team
> (I'd call myself the "dpq-hacker" within R core -- related to
>  my CRAN package 'DPQ')
> 
> 
> > A more general illustration:
> > http://www.math.mcmaster.ca/bolker/misc/qbeta.png
> 
> > ===
> > fun <- function(phi,i=0.01,t=0.001, f=qbeta) {
> > f(t,shape1=i/phi,shape2=(1-i)/phi, lower.tail=FALSE)
> > }
> > ## brute-force beta quantile function
> > Qbeta0 <- function(t,shape1,shape2,lower.tail=FALSE) {
> > fn <- function(x) {pbeta(x,shape1,shape2,lower.tail=lower.tail)-t}
> > uniroot(fn,interval=c(0,1))$root
> > }
> > Qbeta <- Vectorize(Qbeta0,c("t","shape1","shape2"))
> > curve(fun,from=1,to=4)
> > curve(fun(x,f=Qbeta),add=TRUE,col=2)
> 
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] closing R graphics windows?

2020-05-26 Thread Ben Bolker



   Does anyone have any idea how hard it would be/where to start if one 
wanted to hack/patch R to allow X11 graphics windows that had keyboard 
focus to be closed with standard keyboard shortcuts (e.g. Ctrl-W to 
close on Linux)?  Has this been suggested/tried before?


   cheers

    Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Disappearance of the file "NEWS"

2010-08-01 Thread Ben Bolker
Laurent  gmail.com> writes:

> 
> Dear R-developpers,
> 
> The file NEWS disappeared in r5243, and the authoritative source of 
> information for what has changed in R is in ./doc/NEWS.Rd.
> 
> A quick glance at NEWS was extremely helpful for knowing what has 
> changed, and whether building a (more) recent development version was 
> needed to test changes, or verify that they would not break existing code.

 [snip to make Gmane happy]

  Agreed.  This change also breaks the link on the R developers
page:

https://svn.r-project.org/R/trunk/NEWS

although one can still, sort of, get to the information via

http://developer.r-project.org/RSSfeeds.html

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] adding a built-in drop.levels option for subset() in 2.12 ?

2010-08-15 Thread Ben Bolker

  With the approach of R 2.12.0:

  with mild apologies for re-opening this perennial issue:
is there any hope, if appropriate patches are submitted, of adding a
drop.levels argument (with default equal to FALSE to preserve backward
compatibility/efficiency) to the subset function ... ?
  If not, would a patch to the documentation and/or the R FAQ be accepted?

  This does seem to be a continuing source of confusion/frustration (it
certainly is among my students, and here is some documentation from
r-help over the years).  Note that some of the earliest threads here
refer to the problem (now fixed) that the subset() documentation failed
to note that the existing 'drop' argument would *not* (confusingly) drop
unused levels.

http://finzi.psych.upenn.edu/Rhelp10/2008-April/158566.html
http://finzi.psych.upenn.edu/R/Rhelp02/archive/42976.html
http://finzi.psych.upenn.edu/R/Rhelp02/archive/36961.html
http://finzi.psych.upenn.edu/Rhelp10/2009-November/217878.html
http://article.gmane.org/gmane.comp.lang.r.general/200395

  This suggestion is milder and less wide-ranging than a global
drop.unused.levels option, or than convincing everyone to use strings
rather than factors most of the time ...

  cheers
Ben Bolker


-- 
Ben Bolker
bbol...@gmail.com , bol...@mcmaster.ca
http://www.math.mcmaster.ca/~bolker
GPG key: http://www.math.mcmaster.ca/~bolker/benbolker-publickey.asc

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] No RTFM?

2010-08-21 Thread Ben Bolker
work.
> 
> C. If you are writing code for R itself, or if you are developing a
>package, send your question to r-devel, rather than r-help.
> 
> D. For operating-system or R interface questions, there are dedicated
> lists. See R-sig-Mac, R-sig-Debian, R-sig-Fedora, etc.
> 
> ======
> 
> It will be necessary to add, toward the end, the part about "be polite
> when posting."
> 
> And along the lines of the "No RTFM" policy, I think we should say
> "All RTFM answers should include an exact document and section
> number." It is certainly insulting to answer a question about plot
> with the one line
> 
> ?plot
> 
> but it is not insulting to say "In ?plot, check the "Details" section
> and run the example code."

  Is there any point posting this on the Wiki and letting people
hack at it?

  Ben

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] doc bug in ?residuals.gls

2010-09-14 Thread Ben Bolker

Under the description of the 'type' argument, ?residuals.gls says
'Defaults to ‘"pearson"’.'

But residuals.gls starts

residuals.gls <-
function(object, type = c("response", "pearson", "normalized"), ...)
{
type <- match.arg(type)

...

which sure looks to me like it defaults to "response", not "pearson"
(and it behaves that way in tests).

It would seem to make more sense to change the documentation rather than
the code
since anyone who looked at the docs would have been confused already,
whereas someone who had
been happily using the code without looking at the docs would see a
sudden change in the results ...

This is in nlme 3.1-96, from a fresh tools/rsync-recommended. Sending it
to r-devel for comment because r-core is listed as the maintainer.

sincerely
Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to set up an own package repository

2010-09-17 Thread Ben Bolker
Janko Thyson  ku-eichstaett.de> writes:

> 
> Dear List,
> 
> I'd like to set up a package repository so I can use install.packages() on
> it for home-grown packages. I set up an AMPP infrastructure on a windows box
> already, but I'm pretty lost with respect to what to do next as I didn't do
> any web-programming/admin yet. Could anyone recommend some r-specific
> tutorials or has a couple of suggestions for me? I've had a look at the
> official R manual, but it just describes the required repository structure,
> but not how to implement that. I'd also be willing to dive into SVN and
> alikes if you think that's best practice.

  It's pretty easy.  For example, I have a directory hierarchy

R/bin/{windows,windows64}
R/src/contrib/

  in each bottom-level directory I have the appropriate zipped
binaries or tar.gz sources, along with a PACKAGES file generated
by the write_PACKAGES function in the tools package.
  
  see section 6.6 of the R-admin manual.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] r-forge down?

2010-10-17 Thread Ben Bolker

  anyone have a status report on r-forge ... ?

  From here (Hamilton, ON) I can't ping ...

PING r-forge.wu-wien.ac.at (137.208.57.38) 56(84) bytes of data.

  cheers
    Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in read.table?

2010-11-07 Thread Ben Bolker
  ija.csic.es> writes:

> 
> Thanks. Yes, quote="" solves the problem.
> 
> I would never say, however, from the documentations, that this was causing
> the duplicate records. Rather, I would have expected some kind of
> warning/error message.
> 
> And, yes, I knew that, through duplicate(), R solves gracefully this
> specific problem. Just thought this could be of interests for R devel.
> 

  A bit of a meta- point here: there may indeed be a bug here
(it's the kind of obscure "corner case" that someone may not have
tested), but it's unlikely to get noted as such and fixed unless you
can come up with a clear analysis of what is happening and how the
misinterpretation of quote characters is leading to duplication of
records. (You, or someone else -- recognizing that this may be beyond
your skill level.  It might be that 'just' very careful thought
and analysis of the behavior described in the documentation would
explain this, or one might have to dig through source code in R or C.)
Problems with unescaped/unrecognized quote characters are very
common.

 Otherwise, this will likely be dismissed as a ("doctor, it hurts
when I do this"; "well then, don't do that!") sort of situation.

  Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in read.table?

2010-11-07 Thread Ben Bolker
Ben Bolker  gmail.com> writes:

> 
>   ija.csic.es> writes:
> 
> > 
> > Thanks. Yes, quote="" solves the problem.
> > 
> > I would never say, however, from the documentations, that this was causing
> > the duplicate records. Rather, I would have expected some kind of
> > warning/error message.
> > 
> > And, yes, I knew that, through duplicate(), R solves gracefully this
> > specific problem. Just thought this could be of interests for R devel.
> > 
> 
>   A bit of a meta- point here: there may indeed be a bug here
> (it's the kind of obscure "corner case" that someone may not have
> tested), but it's unlikely to get noted as such and fixed unless you
> can come up with a clear analysis of what is happening and how the
> misinterpretation of quote characters is leading to duplication of
> records. (You, or someone else -- recognizing that this may be beyond
> your skill level.  It might be that 'just' very careful thought
> and analysis of the behavior described in the documentation would
> explain this, or one might have to dig through source code in R or C.)
> Problems with unescaped/unrecognized quote characters are very
> common.
> 
>  Otherwise, this will likely be dismissed as a ("doctor, it hurts
> when I do this"; "well then, don't do that!") sort of situation.
> 
>   Ben Bolker

  Following up on my own point: 

The bottom line is that the internal readTableHead() command
handles newlines within quoted strings differently from scan().

  Explanation:

a simpler file that replicates the problem is

a b'c"d"e
f g'h"i"j
k l'm"n"o 

(didn't want to try reading this from a textConnection --
escaping all the quotes properly would have driven me nuts).

 One of the first things that happens in read.table is that
the first few lines are read with readTableHead:

  lines <- .Internal(readTableHead(file, nlines, comment.char, 
   blank.lines.skip, quote, sep))

  in this case, this reads the first two lines as one line;
the single quote at pos. 4 of the first line closes on pos.
4 of the second line, preventing the first new line from
ending a line.

  R then pushes back two copies of the lines that have
been read (this is normal behavior; I don't quite follow the
logic).

  The rest of the file is read with scan(), 1 line at a time.
However, there is the discrepancy between the way
that readTableHead interprets new lines in the middle of
quoted strings (it ignores them) and the way that scan()
interprets them (it takes them as the end of the quoted string).

In particular, if the  file "tmp3.txt" is as shown above, then
the command

.Internal(readTableHead(file("tmp3.txt"),nlines=1L,"#",FALSE,quote="\"'",sep=""))

produces

[1] "a b'c\"d\"e\nf g'h\"i\"j"
 
(i.e. it grabs the first two lines, including the \n)

and 

scan(file("tmp3.txt"),nlines=1L,quote="\"'",what="")

produces

Read 2 items
[1] "a" "b'c\"d\"e"

(it terminates the line in the middle of the string opened
by the single quote).

 I don't know what the consequences would be of changing
readTableHead to match scan()'s behavior, or how much
trouble it would be to do so.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


  1   2   3   4   5   6   >