date:20171021

Re: [Rd] Illegal Logical Values

2017-10-21 Thread Radford Neal

> On Fri, 2017-10-20 at 14:01 +, brodie gaslam via R-devel wrote:

> > I'm thinking of this passage:
> > 
> > > Logical values are sent as 0 (FALSE), 1 (TRUE) or INT_MIN =
> > > -2147483648 (NA, but only if NAOK is true), and the compiled code
> > > should return one of these three values. (Non-zero values other
> > > than INT_MIN are mapped to TRUE.) 
> > 
> > The parenthetical seems to suggest that something like 'LOGICAL(x)[0]
> > = 2;' will be treated as TRUE, which it sometimes is, and sometimes
> > isn't:
> 
> From: Martyn Plummer 

> The title of Section 5.2 is "Interface functions .C and .Fortran" and
> the text above refers to those interfaces. It explains how logical
> vectors are mapped to C integer arrays on entry and back again on exit.
> 
> This does work as advertised. 


Not always.  As I reported on bugzilla three years ago (PR#15878), it
only works if the logical argument does not have to be copied.  The
bug has been fixed in pqR since pqR-2014-09-30.

   Radford Neal

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Function 'factor' issues

2017-10-21 Thread Suharto Anggono Suharto Anggono via R-devel

My idea (like in https://bugs.r-project.org/bugzilla/attachment.cgi?id=1540 ):
- For remapping, use
f <- match(xlevs, nlevs)[f]
instead of
f <- match(xlevs[f], nlevs)
(I have mentioned it).
- Remap only if length(nlevs) differs from length(xlevs) .

On use of 'order' in function 'factor' in R devel, factor.Rd still says 
'sort.list' in "Details" section.

My comments on the part of "Details" section:
- Sortable 'x' is needed only when 'levels' is not specified.
- Complete requirements for properly working factor(x) in R devel: 
'as.character', 'order', 'unique' corresponding to '['. Take data frame and 
"Surv" object (package survival) as examples.

On Wed, 18/10/17, Martin Maechler  wrote:

 Subject: Re: [Rd] Function 'factor' issues

 Cc: r-devel@r-project.org
 Date: Wednesday, 18 October, 2017, 11:54 PM

> Suharto Anggono Suharto Anggono via R-devel 
>     on Sun, 15 Oct 2017 16:03:48 + writes:

    > In R devel, function 'factor' has been changed, allowing and merging 
duplicated 'labels'.

Indeed.  That had been asked for and discussed a bit on this
list from June 14 to June 23, starting at
   https://stat.ethz.ch/pipermail/r-devel/2017-June/074451.html

    > Issue 1: Handling of specified 'labels' without duplicates is slower than 
before.
    > Example:
    > x <- rep(1:26, 4)
    > system.time(factor(x, levels=1:26, labels=letters))

    > Function 'factor' is already rather slow because of conversion to 
character. Please don't add slowdown.

Indeed, I doo see a ~ 20%  performance loss for the example
above, and I may get to look into this.
However, in R-devel there have been important internal
changes (ALTREP additions) some of which are currently giving
some performance losses in some cases (but they have the
potential to give big performance _gains_ e.g. for simple
indexing into large vectors which may apply here !).
For factor(), these C level "ALTREP" changes may not be the reason at
all for the slow down;
I may find time to investigate further.

{{ For the ALTREP-change slowdowns I've noticed in some
   indexing/subset operations, we'll definitely have time to look into
   before R-devel is going to be released next spring... and as mentioned,
   these operations may even become considerably faster *thanks*
   to ALTREP ... }}

    > Issue 2: While default 'labels' is 'levels', not specifying 'labels' may 
be different from specifying 'labels' to be the same as 'levels'.

    > Example 1:
    > as.integer(factor(c(NA,2,3), levels = c(2, NA), exclude = NULL))
    > is different from
    > as.integer(factor(c(NA,2,3), levels = c(2, NA), labels = c(2, NA), 
exclude = NULL))

You are right.  But this is not so exceptional and part of the new feature of
'labels' allowing to "fix up" things in such cases.  While it
would be nice if this was not the case the same phenomenon
happens in other functions as well because of lazy evaluation.
I think I had noticed that already and at the time found
"not easy" to work around.
(There are many aspects about changing such important base functions:
 1. not breaking back compatibility ((unless in rare
    border cases, where we are sure it's worth))
 2. Keeping code relatively transparent
 3. Keep the semantics "simple" to document and as intuitive as possible
)

    > File reg-tests-1d.R indicates that 'factor' behavior with NA is slightly 
changed, for the better. NA entry (because it is unmatched to 'levels' argument 
or is in 'exclude') is absorbed into NA in "levels" attribute (comes from 
'labels' argument), if any. The issue is that it happens only when 'labels' is 
specified.

I'm not sure anymore, but I think I had noticed that also in
June, considered to change it and found that such a changed
factor() would be too different from what it has "always been".
So, yes, IIRC, this current behavior is on purpose, if only for back 
compatibility.

    > Function 'factor' could use match(xlevs, nlevs)[f]. It doesn't match NA 
to NA level. When 'f' is long enough, longer than 'xlevs', it is faster than 
match(xlevs[f], nlevs).

    > Example 2:
    > With
    > levs <- c("A","A")  ,
    > factor(levs, levels=levs)
    > gives error, but
    > factor(levs, levels=levs, labels=levs)
    > doesn't.

yes, again that is a consequence of what you said above (before
'Example 1')

    > Note: In theory, if function 'factor' merged duplicated 'labels' in all 
cases, at least in
    > factor(c(sqrt(2)^2, 2))  ,
    > function 'factor' could do matching on original 'x' (without conversion 
to character), as in R before version 2.10.0. If function 'factor' did it,
    > factor(c(sqrt(2)^2, 2), levels = c(sqrt(2)^2, 2), labels = c("sqrt(2)^2", 
"2"))
    > could take sqrt(2)^2 and 2 as distinct.

Well, that may be interesting.. but I doubt if that's somewhere
we should go, easily, because  factor() has been documented to do
what it does now (with very slightly rounding such numbers via as.character(.))
and

Re: [Rd] Illegal Logical Values

Re: [Rd] Function 'factor' issues

2 matches

Site Navigation

Mail list logo

Footer information