date:20190517

[Rd] Give update.formula() an option not to simplify or reorder the result -- request for comments

2019-05-17 Thread Pavel N. Krivitsky

Dear All,

Martin Maechler has asked me to send this to R-devel for discussion
after I submitted it as an enhancement request (
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17563).

At this time, the update.formula() method always performs a number of
transformations on the results, eliminating redundant variables and
reordering interactions to be after the main effects. This is not
always the desired behaviour, because formulas are increasingly used
for purposes other than specifying linear models.

This the proposal is to add an option simplify= (defaulting to TRUE,
for backwards compatibility) that if FALSE will skip the simplification
step.

That is,

> update(a~b:c+b, .~.+b) # default: simplify=TRUE

a ~ b + b:c

> update(a~b:c+b, .~.+b, simplify=FALSE) # results are a mock-up

a ~ b:c + b + b

>From what I can tell, this can be accomplished by skipping the second
line of the implementation of update.formula() ("out <-
formula(terms.formula(tmp, simplify = TRUE))").

Any thoughts? One particular question that Martin raised is whether the
UI should be just a single logical argument, or something else.

Best Regards,
Pavel

-- 
Pavel Krivitsky
Lecturer in Statistics
National Institute of Applied Statistics Research Australia (NIASRA)
School of Mathematics and Applied Statistics | Building 39C Room 154
University of Wollongong NSW 2522 Australia
T +61 2 4221 3713
Web (NIASRA): http://niasra.uow.edu.au/index.html
Web (Personal): http://www.krivitsky.net/research
ORCID: -0002-9101-3362

NOTICE: This email is intended for the addressee named and may contain
confidential information. If you are not the intended recipient, please
delete it and notify the sender. Please consider the environment before
printing this email.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

2019-05-17 Thread Martin Maechler

> Gabriel Becker 
> on Thu, 16 May 2019 15:47:57 -0700 writes:

> Hi Hadley,
> Thanks for the counterpoint. Response below.

> On Thu, May 16, 2019 at 1:59 PM Hadley Wickham  
wrote:

>> The existing behaviour seems inutitive to me. I would consider these
>> invariants for n vector x_i's each with size m:
>> 
>> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
>> 

> Personally, no I wouldn't. I would consider m==0 a degenerate case, where
> there is no data, but I personally find matrices (or data.frames) with 
rows
> but no columns a very strange concept. The converse is not true, I
> understand the utility of columns but no rows, particularly in the
> data.frame case, but rows with no columns are observations we didn't
> observe anything about. Strange, imho.

Gabe, here I have to very strongly disagree.

Matrices (and higher order Arrays)  are  always definitely to
behave "symmetrically" / "uniformly" with respect to all of their dimensions.

We (and the S developers before us) have always taken a lot of
care trying to ensure that this is true.

So for the matrix case, if rows and columns behaved differently
that would be a bug "by definition".

Of course there's one thing where this uniformity / symmetry
must be violated: in the coercion from and to atomic vectors:
There, 'by column' (generalized for arrays to "earlier dimensions vary faster
than later one") has been chosen, not the least because this had
been adapted for Fortran (first, AFAIK) and all related ABIs
dealing with Matrix vector arithmetic for very good (numerical,
performance, known convention) reasons that enabled to know how
fast numerical linear algebra should be implemented.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

2019-05-17 Thread Gabriel Becker

Hi Martin,

Thanks for chiming in. Responses inline.

On Fri, May 17, 2019 at 12:32 AM Martin Maechler 
wrote:

> > Gabriel Becker
> > on Thu, 16 May 2019 15:47:57 -0700 writes:
>
> > Hi Hadley,
> > Thanks for the counterpoint. Response below.
>
> > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham 
> wrote:
>
> >> The existing behaviour seems inutitive to me. I would consider these
> >> invariants for n vector x_i's each with size m:
> >>
> >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
> >>
>
> > Personally, no I wouldn't. I would consider m==0 a degenerate case,
> where
> > there is no data, but I personally find matrices (or data.frames)
> with rows
> > but no columns a very strange concept. The converse is not true, I
> > understand the utility of columns but no rows, particularly in the
> > data.frame case, but rows with no columns are observations we didn't
> > observe anything about. Strange, imho.
>
> Gabe, here I have to very strongly disagree.
>
> Matrices (and higher order Arrays)  are  always definitely to
> behave "symmetrically" / "uniformly" with respect to all of their
> dimensions.
>
> We (and the S developers before us) have always taken a lot of
> care trying to ensure that this is true.
>
> So for the matrix case, if rows and columns behaved differently
> that would be a bug "by definition".
>

I realize now I could have been  clearer/more  explicit about this, but I
wasn't  arguing that the behavior should be different between columns and
rows, just that the behavior in the rows case didn't necessarily make a ton
of sense to me.  I was arguing that a change to both rbind and cbind be
considered when all length zero vectors are passed, not that rbind change
without cbind also changing. I will admit even here to feeling much more
strongly about the data.frame case.

That said, I do see that the cbind/columns argument seems harder (though
not impossible) for me to make. And maybe that's a good enough reason not
to consider such a change, because as I say, I agree the symmetry is
important, and would (also) want  cbind to change the same way rbind did if
such a change  happened, and that might bother many? more people than the
rbind case would. Maybe not though, based on the other responses in the
thread.

Honestly,  the most intuitive thing for me if you rbind or cbind a bunch of
length zero vectors together would be a  0x0 matrix, at  the very least in
the non-named arguments case. Its  a matrix with 0 elements in it, after
all. It seems perhaps that my intuition  is just somewhat  non-standard
though.

> Of course there's one thing where this uniformity / symmetry
> must be violated: in the coercion from and to atomic vectors:
> There, 'by column' (generalized for arrays to "earlier dimensions vary
> faster
> than later one") has been chosen, not the least because this had
> been adapted for Fortran (first, AFAIK) and all related ABIs
> dealing with Matrix vector arithmetic for very good (numerical,
> performance, known convention) reasons that enabled to know how
> fast numerical linear algebra should be implemented.
>

I do understand here, and would never suggest anything  that could damage
numerical linear algebra capabilities, in R or more broadly. That said, can
numerical algebra routines operate meaningfully in the degerate
one/both/all dimensions are 0 case anyway? Maybe they do, I'd be somewhat
surprised but not my area of expertise.

 Best,
~G

>
> Martin
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

2019-05-17 Thread Martin Maechler

> Gabriel Becker 
> on Fri, 17 May 2019 01:06:11 -0700 writes:

> Hi Martin,
> Thanks for chiming in. Responses inline.

> On Fri, May 17, 2019 at 12:32 AM Martin Maechler 

> wrote:

>> > Gabriel Becker
>> > on Thu, 16 May 2019 15:47:57 -0700 writes:
>> 
>> > Hi Hadley,
>> > Thanks for the counterpoint. Response below.
>> 
>> > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham 
>> wrote:
>> 
>> >> The existing behaviour seems inutitive to me. I would consider these
>> >> invariants for n vector x_i's each with size m:
>> >>
>> >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
>> >>
>> 
>> > Personally, no I wouldn't. I would consider m==0 a degenerate case,
>> where
>> > there is no data, but I personally find matrices (or data.frames)
>> with rows
>> > but no columns a very strange concept. The converse is not true, I
>> > understand the utility of columns but no rows, particularly in the
>> > data.frame case, but rows with no columns are observations we didn't
>> > observe anything about. Strange, imho.
>> 
>> Gabe, here I have to very strongly disagree.
>> 
>> Matrices (and higher order Arrays)  are  always definitely to
>> behave "symmetrically" / "uniformly" with respect to all of their
>> dimensions.
>> 
>> We (and the S developers before us) have always taken a lot of
>> care trying to ensure that this is true.
>> 
>> So for the matrix case, if rows and columns behaved differently
>> that would be a bug "by definition".
>> 

> I realize now I could have been  clearer/more  explicit about this, but I
> wasn't  arguing that the behavior should be different between columns and
> rows, just that the behavior in the rows case didn't necessarily make a 
ton
> of sense to me.  I was arguing that a change to both rbind and cbind be
> considered when all length zero vectors are passed, not that rbind change
> without cbind also changing. I will admit even here to feeling much more
> strongly about the data.frame case.

> That said, I do see that the cbind/columns argument seems harder (though
> not impossible) for me to make. And maybe that's a good enough reason not
> to consider such a change, because as I say, I agree the symmetry is
> important, and would (also) want  cbind to change the same way rbind did 
if
> such a change  happened, and that might bother many? more people than the
> rbind case would. Maybe not though, based on the other responses in the
> thread.

> Honestly,  the most intuitive thing for me if you rbind or cbind a bunch 
of
> length zero vectors together would be a  0x0 matrix, at  the very least in
> the non-named arguments case. Its  a matrix with 0 elements in it, after
> all. It seems perhaps that my intuition  is just somewhat  non-standard
> though.

I think  your "problem"  may be that you've not appreciated yet
the importance of   {0 x p}  and {n x 0}  matrices  and would
think all of these should be  {0 x 0} ?

Believe me we did quite a bit of reasoning and looking at
associative law and transitiveness etc at the time, which I can't easily
recall, but believe me that it has been very beneficial to
consistently deal with  n x 0   and  0 x d  matrices :
Much of R code could be simplified / automagically worked
correctly in edge cases, once such matrices were fulfilling
basic consistency identities.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Barplot & Boxplot: lim and horizontal

2019-05-17 Thread Colin Gillespie

Dear All,

I've noticed an inconsistency between boxplot & barplot regarding the
interaction between
switching to a horizontal graph and the limits.

par(mfrow = c(2, 2))
boxplot(1:10, xlim = c(0.1, 5), ylim = c(1, 10),
  log = "y", horizontal=FALSE, xlab = "X", ylab = "Y")
axis(1)
# Changing to horizontal, xlim <-> ylim
# log = y is still the y-axis. ylab is still the y-axis
boxplot(1:10, xlim = c(0.1, 5), ylim = c(1, 10), log = "y",
   horizontal=TRUE, xlab = "X", ylab = "Y")
axis(2)

barplot(2, xlim = c(0.1, 5), ylim = c(0.1, 10),
  log = "y", horiz=FALSE, xlab = "X", ylab = "Y")
axis(1)
# Changing to horizontal, xlim still refers to xlim
# log = y is still the y-axis. ylab is still the y-axis
barplot(2, xlim = c(0.1, 5), ylim = c(0.1, 10), log = "y",
horiz=TRUE, xlab = "X", ylab = "Y")
axis(2)

Thanks

Colin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] ALTREP: Bug reports

2019-05-17 Thread 介非王

Thank you very much for your answer. If I understand it correctly, for an
ALTREP class, a non-deep copy only creates a new ALTREP object but refers
to the same underlying SEXP as the old ALTREP object has, is it correct?
But since they all share the same underlying SEXP, will change of the value
in the old ALTREP object cause the change of the value in the new ALTREP
object? Or do you mean we need to decide which SEXP has to be copied even
*deep==FALSE*? I made a small test code:

x=runif(10)
> so1=sharedObject(x)
> so2=so1
> so2[1]=10


The last line of the code will call the duplicate function with
*deep==FALSE,* which does not sound correct to me if we don't do a deep
copy of the SEXP.

Best,
Jiefei

On Thu, May 16, 2019 at 3:07 PM Gabriel Becker 
wrote:

> Jiefei,
>
> Inline.
>
> On Thu, May 16, 2019 at 2:30 PM 介非王  wrote:
>
>> Hello Luke and Gabriel,
>>
>> Thank you very much for your quick responses. The explanation of STDVEC
>> is very helpful and I appreciate it! For the wrapper, I have a few new
>> questions.
>>
>>
>> 1. Like Luke said a mutable object is not possible. However, I noticed
>> that there is one extra argument *deep* in the function duplicate. I've
>> googled all the available documentation for ALTREP but I did not find any
>> explanation of it. Could you please give some detail on it?
>>
>
> Deep means in the case of compound/nested structure, e.g., most easily
> illustrative the case of a list in R (or VECSXP in C) , do the elements
> need to be duplicated (deep == TRUE) or *only* the "container" SEXP.
>
> Consider an R list:
>
> x = 1:5
>
> y = 2:20
>
> z= c(TRUE, FALSE)
>
> w = "hi there"
>
> lst = list(a= x, b = y, c =z)
>
> lst2 =lst # NAMED == 2, more than one symbol pointing to
>
> And we want to modify lst like so
>
> lst[[2]] = w
>
> We need to duplicate the "container SEXP", ie the VECSXP, so that lst's
> SEXP and lst2's SEXP point to diferent SEXPs in their second element, but
> we don't need to duplicate any SEXPs that represent the data in any of the
> elements (the SEXPs bound to symbols x, y, z, and w), because none of those
> were modified.
>
> Thus, if deep == FALSE, those element SEXPs are NOT duplicated, just the
> top-level one is. if deep==TRUE, then the element SEXPs are duplicated too,
> because  R decided it neeeded that to happen for some reason.
>
> In terms of implementing an ALTREP class, you can either a) just ignore
> deep and *always* do a deep (ie full) duplication of everything in your
> ALTREP class, or  b) you can pay attention to it and  always create a new
> altrep  but which can potentially - *ONLY in cases where deep==FALSE* -
> not duplicate the SEXPs that make up its alternative representation,
> provided you're careful about then making sure that duplication happens at
> a later time if necessary.
>
> I'd strongly suggest starting with option (a) just to have something
> working and completely safe, then considering if its important enough to
> you to look into (b).
>
> Does that make sense?
>
> Best,
> ~G
>
>
>
>
>>
>> 2.
>>
>>> The first one correctly returns its internal data structure, but the
>>> second
>>> one returns the ALTREP object it wraps since the wrapper itself is an
>>> ALTREP. This behavior is unexpected.
>>
>>
>> I disagree. R_altrep_data1 returns whatever THAT altrep SEXP stores in
>>> its "data1" part. There is no recursion/descent going on, and there
>>> shouldn't be.
>>
>>
>> This is might be a bug since in R release 3.6 it will return the ALTREP
>> instead of the data of the ALTREP. I'm not sure if it has been fixed in
>> 3.7. Here is a simple example:
>>
>> SEXP C_peekSharedMemory(SEXP x) {
>>> while (ALTREP(x)) {
>>> Rprintf("getting data 1\n");
>>> x = R_altrep_data1(x);
>>> }
>>> return(x);
>>> }
>>
>>
>> If calling R_altrep_data1 return the internal data directly, we will only
>> see one message. following my last example
>>
>> > .Internal(inspect(so1))
>>> @0x05e7fbb0 14 REALSXP g0c0 [MARK,NAM(7)]  Share object of type
>>> double
>>> > .Internal(inspect(so2))
>>> @0x05fc5ac0 14 REALSXP g0c0 [MARK,NAM(7)]  wrapper
>>> [srt=-2147483648,no_na=0]
>>>   @0x05e7fbb0 14 REALSXP g0c0 [MARK,NAM(7)]  Share object of
>>> type double
>>> > sm1=peekSharedMemory(so1)
>>> getting data 1
>>> > sm2=peekSharedMemory(so2)
>>> getting data 1
>>> getting data 1
>>
>>
>> We see that so2 call R_altrep_data1 twice to get the internal data. This
>> is very unexpected.
>>
>> Thank you very much for your help again!
>>
>> Best,
>> Jiefei
>>
>>
>>
>> On Thu, May 16, 2019 at 3:47 PM Gabriel Becker 
>> wrote:
>>
>>> Hi Jiefei,
>>>
>>> Thanks for tryingout the ALTREP stuff and letting us know how it is
>>> going. That said I don't think either of these are bugs, per se, but rather
>>> a misunderstanding of the API. Details inline.
>>>
>>>
>>>
>>> On Thu, May 16, 2019 at 11:57 AM 介非王  wrote:
>>>
 Hello,

 I have encountered two bugs when using ALTREP APIs.

 1. STDVEC_DATAPTR

 From RInternal.h

Re: [Rd] print.() not called when autoprinting

2019-05-17 Thread Abby Spurdle

I don't know the answer to your question.
However, here's a side issue that may be relevant.

Last year, I tried creating my own ecdf object, and redefined the print
method for ecdf.

It worked ok in the console, interactively.
However, when I tried calling the method (with autoprinting) inside an
Sweave document, the stats package method was used instead of my method.
I never determined why this was happening.
However, R check generated a warning later, so I renamed the classes and
methods.


Abs


On Fri, May 17, 2019 at 6:57 AM William Dunlap via R-devel <
r-devel@r-project.org> wrote:
>
> In R-3.6.0 autoprinting was changed so that print methods for the storage
> modes are not called when there is no explicit class attribute.   E.g.,
>
> % R-3.6.0 --vanilla --quiet
> > print.function <- function(x, ...) { cat("Function with argument list
");
> cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
> > f <- function(x, ...) { sum( x * seq_along(x) ) }
> > f
> function(x, ...) { sum( x * seq_along(x) ) }
> > print(f)
> Function with argument list function (x, ...)
>
> Previous to R-3.6.0 autoprinting did call such methods
> % R-3.5.3 --vanilla --quiet
> > print.function <- function(x, ...) { cat("Function with argument list
");
> cat(sep="\n", head(deparse(args(x)), -1)); invisible(x) }
> > f <- function(x, ...) { sum( x * seq_along(x) ) }
> > f
> Function with argument list function (x, ...)
> > print(f)
> Function with argument list function (x, ...)
>
> Was this intentional?
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Give update.formula() an option not to simplify or reorder the result -- request for comments

Re: [Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Re: [Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Re: [Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

[Rd] Barplot & Boxplot: lim and horizontal

Re: [Rd] ALTREP: Bug reports

Re: [Rd] print.() not called when autoprinting

7 matches

Site Navigation

Mail list logo

Footer information