date:20190405

Re: [Rd] Bug in the "reformulate" function in stats package

2019-04-05 Thread Martin Maechler

> Ben Bolker 
> on Thu, 4 Apr 2019 12:46:37 -0400 writes:

  > Proposed patch 

Thank you Ben!


[the rest is technical nit-picking .. but hopefully interesting
 to the smart R-devel reader base:]

There was a very subtle thinko in your patch which is not easily
diagnosed from R's parse_Rd():

Error in 
parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd",  
: 
  Unexpected end of input (in " quoted string opened at 
delete.response.Rd:78:63)
In addition: Warning message:
In 
parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd",  
:
  newline within quoted string at delete.response.Rd:74

and even I needed more than a minute to find out that the
culprit was that

  reformulate(sprintf("`%s`", x))

is not ok in *.Rd  and must be

  reformulate(sprintf("`\%s`", x))

-

  > (I think .txt files work OK as attachments to the list?)   

yes, typically -- what really counts is if your e-mail program
marks them with MIME-type 'text/plain'
and most E-mail programs are very "silly" / "safe" nowadays and
don't expect to have smart users  and hence mark (and sometimes
encode) everything unknown as non-text. 

Using very old flexible e-mail interfaces such as Emacs VM allow
you to specify the MIME-type in addition to the file *and* it
also proposes smart defaults, I think by using something like
unix 'file' to determine that your 'foo.diff' file is plain text.
{{ .. and we all know that Windows is sillily using file extensions
   to determine file type and only knows  Windows-extensions plus
   those added explicitly by software installed; so nowadays *.rda
   is marked as an Rstudio file ... [argh].
}}

Martin

> On 2019-04-04 2:21 a.m., Martin Maechler wrote:
>>> Ben Bolker 
>>> on Fri, 29 Mar 2019 12:34:50 -0400 writes:
>> 
>> > I suspect that the issue is addressed (obliquely) in the examples,
>> > which shows that variables with spaces in them (or otherwise
>> > 'non-syntactic', i.e. not satisfying the constraints of legal R 
symbols)
>> > can be handled by protecting them with backticks  (``)
>> 
>> > ## using non-syntactic names:
>> > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))
>> 
>> > It seems to me there could be room for a *documentation* patch (stating
>> > explicitly that if termlabels has length > 1 its elements are
>> > concatenated with "+", and explicitly stating that non-syntactic names
>> > must be protected with back-ticks).  (There is a little bit of 
obscurity
>> > in the fact that the elements of termlabels don't have to be
>> > syntactically valid names: many will be included in formulas if they 
can
>> > be interpreted as *parseable* expressions, e.g. reformulate("x<2"))
>> 
>> > I would be happy to give it a shot if the consensus is that it would
>> > be worthwhile.
>> 
>> I think it would be worthwhile to add to the docs a bit.
>> 
>> [With currently just your and my vote, we have a 100% consensus
>> ;-)]
>> 
>> Martin
>> 
>> > One workaround to the OP's problem is below (may be worth including
>> > as an example in docs)
>> 
>> >> z <- c("a variable","another variable")
>> >> reformulate(z)
>> > Error in parse(text = termtext, keep.source = FALSE) :
>> > :1:6: unexpected symbol
>> > 1:  ~ a variable
>> > ^
>> >> reformulate(sprintf("`%s`",z))
>> > ~`a variable` + `another variable`
>> 
>> 
>> 
>> 
>> > On 2019-03-29 11:54 a.m., J C Nash wrote:
>> >> The main thing is to post the "small reproducible example".
>> >> 
>> >> My (rather long term experience) can be written
>> >> 
>> >> if (exists("reproducible example") ) {
>> >> DeveloperFixHappens()
>> >> } else {
>> >> NULL
>> >> }
>> >> 
>> >> JN
>> >> 
>> >> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
>> >>> Well, first I can't sign in bugzilla myself, that is why I wrote 
here first. Also, I don't know if I have the time at
>> >>> the moment to provide tests, multiple examples or more. If that is 
not ok or welcomed, that is fine, I can come back,
>> >>> whenever I have more time to properly report the bug.
>> >>> 
>> >>> I didn't find the existing bug report, sorry for that.
>> >>> 
>> >>> Yes, it is related. My problem was that I have column names with 
spaces and current solution doesn't solve it. I have a
>> >>> solution, which works for me and maybe also for others.
>> >>> 
>> >>> Either, someone can register me to bugzilla or I can post it here, 
which could give some direction to developers. I
>> >>> don't mind whichever is preferred here.
>> >>> 
>> >>> Best,
>> >>> 
>> >>> Saren
>> >>> 
>> >>> 
>> >>> On 29.03.19 09:29, Martin Maechler wrote:
>> > Saren Tasciyan
>> >  on Thu, 28 Mar 2019 17:02:10 +0100

Re: [Rd] subscripting a terms object

2019-04-05 Thread Martin Maechler

Dear Terry,

> Therneau, Terry M , Ph D via R-devel 
> on Thu, 4 Apr 2019 22:48:49 -0400 writes:

> Someone sent me a bug report for survival2.44.1-1 that involves a model 
with both cluster 
> and offset.  It turns out to be a 3 part issue with [.terms and my own 
untangle.specials 
> routine.   I've spent an evening sorting out the details.



>   1. The delete.response() function doesn't remove the response from the 
dataClasses 
> attribute, which leads to a later failure in [.terms for no-response 
models.  Is there a 
> reason for this, or can I make my patch include this oversight as well?

>  2. [.terms messes up predvars and dataClasses if the model has an offset 
term in it.  
> (In select cases 1 and 2 can cancel out and give the correct dataClasses 
attribute.)

The above two seem interesting and relevant to R itself.
As we've recently just fixed a buglet in  reformulate() --
probably unrelated to your problem --  I'd really be interested to see a
repr.ex. (reproducible example) for the above two statements.

... and if you want also a proposal on how to address the
problem in  delete.response()  and/or  `[.terms`()

Martin

>  3. The survival::untangle.specials routine assumed that you can use the 
same 
> subscripting for the terms of a model and the term() object itself, which 
turns out to be 
> almost always true, but only almost.

>   The failure turns out to have probably been there since the Splus days, 
which tells one 
> just how often such a model is used. (One of two edge case bugs sent to 
me in the first 
> days after I pushed it to CRAN: a new release seems to attact them.)   
I'm willing to put 
> together a patch, but given the rarity of these would folks prefer to 
wait until after the 
> April release?   I'm fine with that.  I need the answer to 1 though.

> Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Parsing code with newlines

2019-04-05 Thread Mikhail Titov

Hello!

This is my first post here. I came across the very same problem.
It can be reproduced within modified tests/Embedding/RParseEval.c

Actually this example has another issue, namely it doesn't wrap
everything in R_ToplevelExec . This is a major show stopper for
newcomers as that function is barely mentioned anywhere and longjmp into
terminated setuploop function followed by R_suicide look like a mystery.

Error: bad value
Fatal error: unable to initialize the JIT


That aside, here is the code with newlines that fails to parse. I hope
it will paste alright here.


#include "embeddedRCall.h"
#include 

int
main(int argc, char *argv[])
{
SEXP e, tmp;
int hadError;
ParseStatus status;

init_R(argc, argv);

PROTECT(tmp = mkString("\n\r ls()"));
PROTECT(e = R_ParseVector(tmp, 1, &status, R_NilValue));
if (status != PARSE_OK)
{
printf("boo boo\n");
}
else
{
PrintValue(e);
R_tryEval(VECTOR_ELT(e,0), R_GlobalEnv, &hadError);
}
UNPROTECT(2);

end_R();
return(0);
}


--
Mikhail

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Deep Replicable Bug With AMD Threadripper MultiCore

2019-04-05 Thread ivo welch

The following program is whittled down from a much larger program that
always works on Intel, and always works on AMD's threadripper with
lapply but not mclappy.  With mclapply on AMD, all processes go into
"suspend" mode and the program then hangs.  This bug is replicable on an
AMD Ryzen Threadripper 2950X 16-Core Processor (128GB RAM), running
latest ubuntu 18.04.  The R version 3.5.3 (2019-03-11) -- "Great Truth" ,
invoked with --vanilla.  I hope this helps...it took quite a while to get
it to this stage.  I sure hope that I am not reporting an old bug...

options("mc.cores"=4)
library(data.table)
library(parallel)

if (!file.exists("bugsample.csv")) {
NR <- 6480
notused <- data.frame(v1=1:NR, v2=1:NR, v3=1:NR, x1=log(1:NR),
x2=log(1:NR))
fwrite(notused, file="bugsample.csv")
stop("you can quit now and restart the program")
}

if (!exists("notused")) notused <- fread("bugsample.csv", nrows= Inf)  ##
needed!  Inf cannot be replaced by actual NR


sample <- data.frame( groupidentifier=c( rep(1,2000), rep(2, 4500 )
) )
sample$yvar <- sin(1:nrow(sample))
sample$xvar <- 1:nrow(sample)


testfun <- function(dl) {
with(dl, message("Working: ", first(groupidentifier), " with ",
nrow(dl)))

lapply( 1:nrow(dl), FUN=function(onedayindex) {
if ((onedayindex %% 500) != 0) return(NULL)
with(dl[1:onedayindex,],
 c( tryCatch( coef(lm( yvar ~ xvar, data=dl[1:onedayindex,]
))[2], error = function(e) NA ) ) )
})
}


message("starting --- replicable hang with mclapply, but not lapply")

o <- mclapply(split( 1:nrow(sample), sample$groupidentifier ),
  FUN=function(.index) testfun( sample[.index, , drop=FALSE] ))

message("never gets here with mclapply")

print( do.call("c", o[[1]]) )
print( do.call("c", o[[2]]) )



--
Ivo Welch (ivo.we...@ucla.edu)

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Deep Replicable Bug With AMD Threadripper MultiCore

2019-04-05 Thread Dirk Eddelbuettel



On 4 April 2019 at 17:28, ivo welch wrote:
| The following program is whittled down from a much larger program that
| always works on Intel, and always works on AMD's threadripper with
| lapply but not mclappy.  With mclapply on AMD, all processes go into
| "suspend" mode and the program then hangs.  This bug is replicable on an
| AMD Ryzen Threadripper 2950X 16-Core Processor (128GB RAM), running
| latest ubuntu 18.04.  The R version 3.5.3 (2019-03-11) -- "Great Truth" ,
| invoked with --vanilla.  I hope this helps...it took quite a while to get
| it to this stage.  I sure hope that I am not reporting an old bug...
| 
| options("mc.cores"=4)
| library(data.table)
| library(parallel)

Just how you set mc.cores to 4 for parallel::mclapply I would try throttling
data.table which in its current version goes for all cores. So do, say,

  setDTthreads(4)

and see if that helps. Try lower and lower values to see if you get by.
While there may well be a different race condition in mclapply, it may help
to not overschedule.

(FWIW, the next version of data.table, in queue at CRAN, is less aggressive
and has additional options for fine tuning.)

Dirk

| if (!file.exists("bugsample.csv")) {
| NR <- 6480
| notused <- data.frame(v1=1:NR, v2=1:NR, v3=1:NR, x1=log(1:NR),
| x2=log(1:NR))
| fwrite(notused, file="bugsample.csv")
| stop("you can quit now and restart the program")
| }
| 
| if (!exists("notused")) notused <- fread("bugsample.csv", nrows= Inf)  ##
| needed!  Inf cannot be replaced by actual NR
| 
| 
| sample <- data.frame( groupidentifier=c( rep(1,2000), rep(2, 4500 )
| ) )
| sample$yvar <- sin(1:nrow(sample))
| sample$xvar <- 1:nrow(sample)
| 
| 
| testfun <- function(dl) {
| with(dl, message("Working: ", first(groupidentifier), " with ",
| nrow(dl)))
| 
| lapply( 1:nrow(dl), FUN=function(onedayindex) {
| if ((onedayindex %% 500) != 0) return(NULL)
| with(dl[1:onedayindex,],
|  c( tryCatch( coef(lm( yvar ~ xvar, data=dl[1:onedayindex,]
| ))[2], error = function(e) NA ) ) )
| })
| }
| 
| 
| message("starting --- replicable hang with mclapply, but not lapply")
| 
| o <- mclapply(split( 1:nrow(sample), sample$groupidentifier ),
|   FUN=function(.index) testfun( sample[.index, , drop=FALSE] ))
| 
| message("never gets here with mclapply")
| 
| print( do.call("c", o[[1]]) )
| print( do.call("c", o[[2]]) )
| 
| 
| 
| --
| Ivo Welch (ivo.we...@ucla.edu)
| 
|   [[alternative HTML version deleted]]
| 
| __
| R-devel@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Deep Replicable Bug With AMD Threadripper MultiCore

2019-04-05 Thread Tomas Kalibera




In addition you can also try to use a PSOCK cluster (see makeCluster, 
parLapply) to avoid the problem - it should help if the problem is 
somehow related to forking in mclapply().


The problem you are seeing may be in base R, in data.table, or in 
interaction between the two (mclapply() from base R uses forking 
directly, data.table uses OpenMP). If you think the bug is in base R, it 
would be much better if you could find a reproducible example that would 
only use packages shipped directly with R, otherwise it might be best to 
contact the maintainer of data.table.


Please also make sure to use the latest version of R 3.5 (or R-devel). 
The implementation of forking in parallel packages, and hence also in 
mclapply, has been rewritten since R 3.4.


Best
Tomas

On 4/5/19 1:28 PM, Dirk Eddelbuettel wrote:

On 4 April 2019 at 17:28, ivo welch wrote:
| The following program is whittled down from a much larger program that
| always works on Intel, and always works on AMD's threadripper with
| lapply but not mclappy.  With mclapply on AMD, all processes go into
| "suspend" mode and the program then hangs.  This bug is replicable on an
| AMD Ryzen Threadripper 2950X 16-Core Processor (128GB RAM), running
| latest ubuntu 18.04.  The R version 3.5.3 (2019-03-11) -- "Great Truth" ,
| invoked with --vanilla.  I hope this helps...it took quite a while to get
| it to this stage.  I sure hope that I am not reporting an old bug...
|
| options("mc.cores"=4)
| library(data.table)
| library(parallel)

Just how you set mc.cores to 4 for parallel::mclapply I would try throttling
data.table which in its current version goes for all cores. So do, say,

   setDTthreads(4)

and see if that helps. Try lower and lower values to see if you get by.
While there may well be a different race condition in mclapply, it may help
to not overschedule.

(FWIW, the next version of data.table, in queue at CRAN, is less aggressive
and has additional options for fine tuning.)

Dirk

| if (!file.exists("bugsample.csv")) {
| NR <- 6480
| notused <- data.frame(v1=1:NR, v2=1:NR, v3=1:NR, x1=log(1:NR),
| x2=log(1:NR))
| fwrite(notused, file="bugsample.csv")
| stop("you can quit now and restart the program")
| }
|
| if (!exists("notused")) notused <- fread("bugsample.csv", nrows= Inf)  ##
| needed!  Inf cannot be replaced by actual NR
|
|
| sample <- data.frame( groupidentifier=c( rep(1,2000), rep(2, 4500 )
| ) )
| sample$yvar <- sin(1:nrow(sample))
| sample$xvar <- 1:nrow(sample)
|
|
| testfun <- function(dl) {
| with(dl, message("Working: ", first(groupidentifier), " with ",
| nrow(dl)))
|
| lapply( 1:nrow(dl), FUN=function(onedayindex) {
| if ((onedayindex %% 500) != 0) return(NULL)
| with(dl[1:onedayindex,],
|  c( tryCatch( coef(lm( yvar ~ xvar, data=dl[1:onedayindex,]
| ))[2], error = function(e) NA ) ) )
| })
| }
|
|
| message("starting --- replicable hang with mclapply, but not lapply")
|
| o <- mclapply(split( 1:nrow(sample), sample$groupidentifier ),
|   FUN=function(.index) testfun( sample[.index, , drop=FALSE] ))
|
| message("never gets here with mclapply")
|
| print( do.call("c", o[[1]]) )
| print( do.call("c", o[[2]]) )
|
|
|
| --
| Ivo Welch (ivo.we...@ucla.edu)
|
|   [[alternative HTML version deleted]]
|
| __
| R-devel@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] all.equal failure

2019-04-05 Thread Therneau, Terry M., Ph.D. via R-devel

This arose in testing [.terms and has me confused.

data(esoph)   # use a standard data set

t0x <- terms(model.frame( ~ tobgp, data=esoph))
t1 <-  terms(model.frame(ncases ~ agegp + tobgp, data=esoph))
t1x <- (delete.response(t1))[-1]

 > all.equal(t0x, t1x)
[1] TRUE

# the above is wrong, because they actually are not the same

 > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses'))
[1] "Names: 1 string mismatch"
[2] "Lengths (1, 2) differ (string compare on first 1)"

 > sessionInfo()
R Under development (unstable) (2019-04-05 r76323)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_3.7.0 tools_3.7.0


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] [.terms issue

2019-04-05 Thread Therneau, Terry M., Ph.D. via R-devel

As  footnote, the error in survival:::untangle.specials is that it assumed that 
if 
attr(myterms, 'specials')[['strata']] was = to 4, then one could use 
myterms[-4] to remove 
the strata term.   Not so.   In the model  y ~  x1 + x2 + strata(x3)  the 
attritube will 
be 4 -- the response counts --- but to remove it I need to use [-3] since the 
response 
does not count in [.terms or drop.terms.

Is this an inconsistency that should be documented and/or repaired?   What 
would break if 
we did?    I don't know.  By chance, all the usages in the survival package 
happened after 
a call to delete.response so it would be immune to such a change.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] all.equal failure

2019-04-05 Thread Duncan Murdoch


On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

This arose in testing [.terms and has me confused.

data(esoph)   # use a standard data set

t0x <- terms(model.frame( ~ tobgp, data=esoph))
t1 <-  terms(model.frame(ncases ~ agegp + tobgp, data=esoph))
t1x <- (delete.response(t1))[-1]

  > all.equal(t0x, t1x)
[1] TRUE

# the above is wrong, because they actually are not the same

  > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses'))
[1] "Names: 1 string mismatch"
[2] "Lengths (1, 2) differ (string compare on first 1)"


As documented, all.equal() is generic, with methods for different 
classes.  The classes of both t0x and t1x are


 c("terms","formula")

with no all.equal.terms method, so all.equal.formula is called.  That 
method isn't specifically documented, but you can see its definition as


function (target, current, ...)
{
if (length(target) != length(current))
return(paste0("target, current differ in having response: ",
length(target) == 3L, ", ", length(current) == 3L))
if (!identical(deparse(target), deparse(current)))
"formulas differ in contents"
else TRUE
}

So the issue is that deparse(t0x) and deparse(t1x) give the same strings 
with no attributes shown, even though "showAttributes" is set by 
default.   I haven't traced through the C code to see where things are 
going wrong.


Duncan Murdoch



  > sessionInfo()
R Under development (unstable) (2019-04-05 r76323)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_3.7.0 tools_3.7.0


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: all.equal failure

2019-04-05 Thread Therneau, Terry M., Ph.D. via R-devel

Duncan,
   I should have included it in my original note, but

      all.equal(unclass(t0x), unclass(t1x))

returns TRUE as well.  I had tried that as well.   But a further look at 
all.equal.default 
shows the following line right near the top:
     if (is.language(target) || is.function(target))
     return(all.equal.language(target, current, ...))

and that path explicitly ignores attributes.

I'll change my original original title to "all.equal was not a good tool for 
testing 
certain code issues".

Thanks for the pointer,

Terry



On 4/5/19 9:00 AM, Duncan Murdoch wrote:
> On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:
>> This arose in testing [.terms and has me confused.
>>
>> data(esoph)   # use a standard data set
>>
>> t0x <- terms(model.frame( ~ tobgp, data=esoph))
>> t1 <-  terms(model.frame(ncases ~ agegp + tobgp, data=esoph))
>> t1x <- (delete.response(t1))[-1]
>>
>>   > all.equal(t0x, t1x)
>> [1] TRUE
>>
>> # the above is wrong, because they actually are not the same
>>
>>   > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses'))
>> [1] "Names: 1 string mismatch"
>> [2] "Lengths (1, 2) differ (string compare on first 1)"
>
> As documented, all.equal() is generic, with methods for different classes.  
> The classes 
> of both t0x and t1x are
>
>  c("terms","formula")
>
> with no all.equal.terms method, so all.equal.formula is called. That method 
> isn't 
> specifically documented, but you can see its definition as
>
> function (target, current, ...)
> {
>     if (length(target) != length(current))
>     return(paste0("target, current differ in having response: ",
>     length(target) == 3L, ", ", length(current) == 3L))
>     if (!identical(deparse(target), deparse(current)))
>     "formulas differ in contents"
>     else TRUE
> }
>
> So the issue is that deparse(t0x) and deparse(t1x) give the same strings with 
> no 
> attributes shown, even though "showAttributes" is set by default.   I haven't 
> traced 
> through the C code to see where things are going wrong.
>
> Duncan Murdoch
>
>>
>>   > sessionInfo()
>> R Under development (unstable) (2019-04-05 r76323)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 18.04.2 LTS
>>
>> Matrix products: default
>> BLAS:   /usr/local/src/R-devel/lib/libRblas.so
>> LAPACK: /usr/local/src/R-devel/lib/libRlapack.so
>>
>> locale:
>>    [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>    [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
>>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>    [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>    [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.7.0 tools_3.7.0
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: all.equal failure

2019-04-05 Thread Duncan Murdoch


On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote:

Duncan,
   I should have included it in my original note, but

      all.equal(unclass(t0x), unclass(t1x))

returns TRUE as well.  I had tried that as well.   But a further look at 
all.equal.default shows the following line right near the top:

     if (is.language(target) || is.function(target))
     return(all.equal.language(target, current, ...))

and that path explicitly ignores attributes.


Which R version are you using?  I see deparse(target) and 
deparse(current) in all.equal.language(), and those should not be 
ignoring attributes according to the documentation.


Duncan Murdoch



I'll change my original original title to "all.equal was not a good tool 
for testing certain code issues".


Thanks for the pointer,

Terry



On 4/5/19 9:00 AM, Duncan Murdoch wrote:

On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

This arose in testing [.terms and has me confused.

data(esoph)   # use a standard data set

t0x <- terms(model.frame( ~ tobgp, data=esoph))
t1 <-  terms(model.frame(ncases ~ agegp + tobgp, data=esoph))
t1x <- (delete.response(t1))[-1]

  > all.equal(t0x, t1x)
[1] TRUE

# the above is wrong, because they actually are not the same

  > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses'))
[1] "Names: 1 string mismatch"
[2] "Lengths (1, 2) differ (string compare on first 1)"


As documented, all.equal() is generic, with methods for different 
classes.  The classes of both t0x and t1x are


 c("terms","formula")

with no all.equal.terms method, so all.equal.formula is called. That 
method isn't specifically documented, but you can see its definition as


function (target, current, ...)
{
    if (length(target) != length(current))
    return(paste0("target, current differ in having response: ",
    length(target) == 3L, ", ", length(current) == 3L))
    if (!identical(deparse(target), deparse(current)))
    "formulas differ in contents"
    else TRUE
}

So the issue is that deparse(t0x) and deparse(t1x) give the same 
strings with no attributes shown, even though "showAttributes" is set 
by default.   I haven't traced through the C code to see where things 
are going wrong.


Duncan Murdoch



  > sessionInfo()
R Under development (unstable) (2019-04-05 r76323)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_3.7.0 tools_3.7.0


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

2019-04-05 Thread Therneau, Terry M., Ph.D. via R-devel

On 4/5/19 9:39 AM, Duncan Murdoch wrote:

On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote:

Duncan,
   I should have included it in my original note, but

      all.equal(unclass(t0x), unclass(t1x))

returns TRUE as well.  I had tried that as well.   But a further look at 
all.equal.default shows the following line right near the top:

 if (is.language(target) || is.function(target))
 return(all.equal.language(target, current, ...))

and that path explicitly ignores attributes.

Which R version are you using?  I see deparse(target) and deparse(current) in 
all.equal.language(), and those should not be ignoring attributes according to the 
documentation.

I'm using today's version of R-devel on Ubuntu.  (svn up this AM)
But I agree, both target and current appear.

Duncan Murdoch

I'll change my original original title to "all.equal was not a good tool for testing 
certain code issues".

Thanks for the pointer,

Terry

On 4/5/19 9:00 AM, Duncan Murdoch wrote:

On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

This arose in testing [.terms and has me confused.

data(esoph)   # use a standard data set

t0x <- terms(model.frame( ~ tobgp, data=esoph))
t1 <-  terms(model.frame(ncases ~ agegp + tobgp, data=esoph))
t1x <- (delete.response(t1))[-1]

  > all.equal(t0x, t1x)
[1] TRUE

# the above is wrong, because they actually are not the same

  > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses'))
[1] "Names: 1 string mismatch"
[2] "Lengths (1, 2) differ (string compare on first 1)"

As documented, all.equal() is generic, with methods for different classes.  The 
classes of both t0x and t1x are

 c("terms","formula")

with no all.equal.terms method, so all.equal.formula is called. That method isn't 
specifically documented, but you can see its definition as

function (target, current, ...)
{
    if (length(target) != length(current))
    return(paste0("target, current differ in having response: ",
    length(target) == 3L, ", ", length(current) == 3L))
    if (!identical(deparse(target), deparse(current)))
    "formulas differ in contents"
    else TRUE
}

So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no 
attributes shown, even though "showAttributes" is set by default.   I haven't traced 
through the C code to see where things are going wrong.

Duncan Murdoch

  > sessionInfo()
R Under development (unstable) (2019-04-05 r76323)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.7.0 tools_3.7.0

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

2019-04-05 Thread Duncan Murdoch

On 05/04/2019 10:46 a.m., Therneau, Terry M., Ph.D. wrote:

On 4/5/19 9:39 AM, Duncan Murdoch wrote:

On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote:

Duncan,
    I should have included it in my original note, but

       all.equal(unclass(t0x), unclass(t1x))

returns TRUE as well.  I had tried that as well.   But a further look at
all.equal.default shows the following line right near the top:
  if (is.language(target) || is.function(target))
  return(all.equal.language(target, current, ...))

and that path explicitly ignores attributes.

Which R version are you using?  I see deparse(target) and deparse(current) in
all.equal.language(), and those should not be ignoring attributes according to 
the
documentation.

I'm using today's version of R-devel on Ubuntu.  (svn up this AM)
But I agree, both target and current appear.

That's not what I said.  I said that the attributes should not be 
ignored in that function.  I don't see anything in the R-devel version 
of it that ignores attributes:

> all.equal.language
function (target, current, ...)
{
mt <- mode(target)
mc <- mode(current)
if (mt == "expression" && mc == "expression")
return(all.equal.list(target, current, ...))
ttxt <- paste(deparse(target), collapse = "\n")
ctxt <- paste(deparse(current), collapse = "\n")
msg <- c(if (mt != mc) paste0("Modes of target, current: ",
mt, ", ", mc), if (ttxt != ctxt) {
if (pmatch(ttxt, ctxt, 0L)) "target is a subset of current" 
else if (pmatch(ctxt,
ttxt, 0L)) "current is a subset of target" else "target, 
current do not match when deparsed"

})
if (is.null(msg))
TRUE
else msg
}

Duncan Murdoch

Duncan Murdoch

I'll change my original original title to "all.equal was not a good tool for 
testing
certain code issues".

Thanks for the pointer,

Terry

On 4/5/19 9:00 AM, Duncan Murdoch wrote:

On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

This arose in testing [.terms and has me confused.

data(esoph)   # use a standard data set

t0x <- terms(model.frame( ~ tobgp, data=esoph))
t1 <-  terms(model.frame(ncases ~ agegp + tobgp, data=esoph))
t1x <- (delete.response(t1))[-1]

   > all.equal(t0x, t1x)
[1] TRUE

# the above is wrong, because they actually are not the same

   > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses'))
[1] "Names: 1 string mismatch"
[2] "Lengths (1, 2) differ (string compare on first 1)"

As documented, all.equal() is generic, with methods for different classes.  The
classes of both t0x and t1x are

  c("terms","formula")

with no all.equal.terms method, so all.equal.formula is called. That method 
isn't
specifically documented, but you can see its definition as

function (target, current, ...)
{
     if (length(target) != length(current))
     return(paste0("target, current differ in having response: ",
     length(target) == 3L, ", ", length(current) == 3L))
     if (!identical(deparse(target), deparse(current)))
     "formulas differ in contents"
     else TRUE
}

So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no
attributes shown, even though "showAttributes" is set by default.   I haven't 
traced
through the C code to see where things are going wrong.

Duncan Murdoch

   > sessionInfo()
R Under development (unstable) (2019-04-05 r76323)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
    [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
    [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
    [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
    [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.7.0 tools_3.7.0

 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

2019-04-05 Thread Martin Maechler

> Duncan Murdoch 
> on Fri, 5 Apr 2019 11:12:48 -0400 writes:

> On 05/04/2019 10:46 a.m., Therneau, Terry M., Ph.D. wrote:
>> 
>> 
>> On 4/5/19 9:39 AM, Duncan Murdoch wrote:
>>> On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote:
 Duncan,
    I should have included it in my original note, but

       all.equal(unclass(t0x), unclass(t1x))

 returns TRUE as well.  I had tried that as well.   But a further look 
at
 all.equal.default shows the following line right near the top:
  if (is.language(target) || is.function(target))
  return(all.equal.language(target, current, ...))

 and that path explicitly ignores attributes.
>>> 
>>> Which R version are you using?  I see deparse(target) and 
deparse(current) in
>>> all.equal.language(), and those should not be ignoring attributes 
according to the
>>> documentation.

But the problem is that indeed  "of course"  all.equal.formula()
and not all.equal.language() is called for the terms since as
you yourself remarked, their class is  c("terms", "formula"),

and so what Terry reported is indeed correct *and* a bug
and in "all versions" of R (I did not look far back, but these things
haven't changed much).

The cleanest would probably be to define an  all.equal.terms()
method, as I think there may be more code relying on the
behavior of  all.equal.formula() to only look at the formulas
themselves and not their attributes...
but you (Duncan) and others may have a different opinion.

Martin Maechler
ETH Zurich and R Core Team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

2019-04-05 Thread Martin Maechler

> Martin Maechler 
> on Fri, 5 Apr 2019 17:33:54 +0200 writes:

> Duncan Murdoch 
> on Fri, 5 Apr 2019 11:12:48 -0400 writes:

>> On 05/04/2019 10:46 a.m., Therneau, Terry M., Ph.D. wrote:
>>> 
>>> 
>>> On 4/5/19 9:39 AM, Duncan Murdoch wrote:
 On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote:
> Duncan,
>    I should have included it in my original note, but
> 
>       all.equal(unclass(t0x), unclass(t1x))
> 
> returns TRUE as well.  I had tried that as well.   But a further look 
at
> all.equal.default shows the following line right near the top:
>  if (is.language(target) || is.function(target))
>  return(all.equal.language(target, current, ...))
> 
> and that path explicitly ignores attributes.
 
 Which R version are you using?  I see deparse(target) and 
deparse(current) in
 all.equal.language(), and those should not be ignoring attributes 
according to the
 documentation.

> But the problem is that indeed  "of course"  all.equal.formula()
> and not all.equal.language() is called for the terms since as
> you yourself remarked, their class is  c("terms", "formula"),

> and so what Terry reported is indeed correct *and* a bug
> and in "all versions" of R (I did not look far back, but these things
> haven't changed much).

> The cleanest would probably be to define an  all.equal.terms()
> method, as I think there may be more code relying on the
> behavior of  all.equal.formula() to only look at the formulas
> themselves and not their attributes...
> but you (Duncan) and others may have a different opinion.

and I do agree with Duncan even more now that indeed it's very
unsatisfactory that deparse() {and dput(), dump() ..} of a terms
object would only reproduce the formula and nothing else;
and yes that's all in the C code:
 --> src/main/deparse.c
--> in function deparse2buff()
   -->  inside the (350 lines large)  'case LANGSXP'.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

2019-04-05 Thread Duncan Murdoch

On 05/04/2019 11:33 a.m., Martin Maechler wrote:

Duncan Murdoch
 on Fri, 5 Apr 2019 11:12:48 -0400 writes:

 > On 05/04/2019 10:46 a.m., Therneau, Terry M., Ph.D. wrote:
 >>
 >>
 >> On 4/5/19 9:39 AM, Duncan Murdoch wrote:
 >>> On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote:
  Duncan,
     I should have included it in my original note, but

        all.equal(unclass(t0x), unclass(t1x))

  returns TRUE as well.  I had tried that as well.   But a further look 
at
  all.equal.default shows the following line right near the top:
   if (is.language(target) || is.function(target))
   return(all.equal.language(target, current, ...))

  and that path explicitly ignores attributes.
 >>>
 >>> Which R version are you using?  I see deparse(target) and 
deparse(current) in
 >>> all.equal.language(), and those should not be ignoring attributes 
according to the
 >>> documentation.

But the problem is that indeed  "of course"  all.equal.formula()
and not all.equal.language() is called for the terms since as
you yourself remarked, their class is  c("terms", "formula"),

and so what Terry reported is indeed correct *and* a bug
and in "all versions" of R (I did not look far back, but these things
haven't changed much).

The cleanest would probably be to define an  all.equal.terms()
method, as I think there may be more code relying on the
behavior of  all.equal.formula() to only look at the formulas
themselves and not their attributes...
but you (Duncan) and others may have a different opinion.

I don't know if that would be easy -- it seems to me there is a bug in 
deparse(), which won't show attributes on language objects even if you 
ask it to:

# This is fine:
deparse(structure(1, attrib=2))
# [1] "structure(1, attrib = 2)"

# This doesn't show the attributes
deparse(structure(quote(f(1)), attrib=2))
# [1] "f(1)"

But as you mention, if this isn't a new bug fixing it will likely cause 
problems for people who assume it is intentional...

Duncan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] all.equal failure and [.terms

2019-04-05 Thread Therneau, Terry M., Ph.D. via R-devel

The all.equal was a side issue for me; I don't have strong opinions one way or 
the other.  
You are welcome to leave me out of the loop on that.  (Or leave me on the cc, 
whatever is 
easiest).

  I will update the survival package once the [.terms issues are addressed.

  One debatable issues is the choice of change vs document for the offset() 
issue.  With 
my proposed fix or without it, offsets are completely ignored by [.terms and 
dropterms.  
So with a formula of

    z <- terms(y~ x1 + offset(x2) + x3)

the 2 in drop.terms(z, 2) or z[-2] refers to x3, and the result will drop both 
the offset 
and x3.  For the use cases that I can think of the two functions are used at 
the 'build 
the X matrix' stage, offsets have already been accounted for, and the present 
behavior is 
fine.  My vote would be to document it with a few lines in the help file since 
that is the 
easiest.  Offsets don't count as a 'term' in the assign attribute either so the 
current 
behavior is consistent in that respect.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Bug in the "reformulate" function in stats package

Re: [Rd] subscripting a terms object

Re: [Rd] Parsing code with newlines

[Rd] Deep Replicable Bug With AMD Threadripper MultiCore

Re: [Rd] Deep Replicable Bug With AMD Threadripper MultiCore

Re: [Rd] Deep Replicable Bug With AMD Threadripper MultiCore

[Rd] all.equal failure

[Rd] [.terms issue

Re: [Rd] all.equal failure

Re: [Rd] [EXTERNAL] Re: all.equal failure

Re: [Rd] [EXTERNAL] Re: all.equal failure

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

Re: [Rd] all.equal failure and [.terms

17 matches

Site Navigation

Mail list logo

Footer information