On 9/18/21 5:28 AM, Leonard Mada via R-help wrote:
Hello Andrew,


I add this info as a completion (so other users can get a better
understanding):

If we want to perform a survival analysis, than the interval should be
closed to the right, but we should include also the first time point (as
per Intention-to-Treat):

[0, 4](4, 8](8, 12](12, 16]

[0, 4](4, 8](8, 12](12, 16](16, 20]


So the series is extendible to the right without any errors!

But the 1st interval (which is the same in both series) is different
from the other intervals: [0, 4].


I feel that this should have been the default behaviour for cut().

To Leonard;

If you do not like the behavior of `cut`, then you should "roll your own". It's very unlikely that R Core will modify a base cunction like cut. You might want to look at Hmisc::cut2. Frank Harrell didn't like that default behavior and thought he could make a better cut, so he just put it in his package. I did like his version better and often used it when I was actively programming. I suspect there is also a tidyverse cut-like function, but I'm not terribly familiar with that fork of R. (It's really not the same language IMHO.)

But it's a waste of time and energy to try propose modifications of core R functions unless *you* can show that it is stable across 20,000 packages and will not offend long-time users. The likelihood  of that happening for your proposal is vanishing small in my estimation. You shouldn't ask R Core to do that for you. They are busy fixing real bugs.


If you want to persist despite my negativity, then you should make a complete proposal by submitting a proper diff file that incorporates your tested efforts to the Rdevel mailing list.


--

David


Note:

I was induced to think about a different situation in my previous
message, as you constructed open intervals on the right, and also
extended to the right. But survival analysis should be as described in
this mail and should probably be the default.


Sincerely,


Leonard


On 9/18/2021 1:29 AM, Andrew Simmons wrote:
I disagree, I don't really think it's too long or ugly, but if you
think it is, you could abbreviate it as 'i'.


x <- 0:20
breaks1 <- seq.int <http://seq.int>(0, 16, 4)
breaks2 <- seq.int <http://seq.int>(0, 20, 4)
data.frame(
     cut(x, breaks1, right = FALSE, i = TRUE),
     cut(x, breaks2, right = FALSE, i = TRUE),
     check.names = FALSE
)


I hope this helps.

On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada <leo.m...@syonic.eu
<mailto:leo.m...@syonic.eu>> wrote:

     Hello Andrew,


     But "cut" generates factors. In most cases with real data one
     expects to have also the ends of the interval: the argument
     "include.lowest" is both ugly and too long.

     [The test-code on the ftable thread contains this error! I have
     run through this error a couple of times.]


     The only real situation that I can imagine to be problematic:

     - if the interval goes to +Inf (or -Inf): I do not know if there
     would be any effects when including +Inf (or -Inf).


     Leonard


     On 9/18/2021 1:14 AM, Andrew Simmons wrote:
     While it is not explicitly mentioned anywhere in the
     documentation for .bincode, I suspect 'include.lowest = FALSE' is
     the default to keep the definitions of the bins consistent. For
     example:


     x <- 0:20
     breaks1 <- seq.int <http://seq.int>(0, 16, 4)
     breaks2 <- seq.int <http://seq.int>(0, 20, 4)
     cbind(
         .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
         .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
     )


     by having 'include.lowest = TRUE' with different ends, you can
     get inconsistent behaviour. While this probably wouldn't be an
     issue with 'real' data, this would seem like something you'd want
     to avoid by default. The definitions of the bins are


     [0, 4)
     [4, 8)
     [8, 12)
     [12, 16]


     and


     [0, 4)
     [4, 8)
     [8, 12)
     [12, 16)
     [16, 20]


     so you can see where the inconsistent behaviour comes from. You
     might be able to get R-core to add argument 'warn', but probably
     not to change the default of 'include.lowest'. I hope this helps


     On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada <leo.m...@syonic.eu
     <mailto:leo.m...@syonic.eu>> wrote:

         Thank you Andrew.


         Is there any reason not to make: include.lowest = TRUE the
         default?


         Regarding the NA:

         The user still has to suspect that some values were not
         included and run that test.


         Leonard


         On 9/18/2021 12:53 AM, Andrew Simmons wrote:
         Regarding your first point, argument 'include.lowest'
         already handles this specific case, see ?.bincode

         Your second point, maybe it could be helpful, but since both
         'cut.default' and '.bincode' return NA if a value isn't
         within a bin, you could make something like this on your own.
         Might be worth pitching to R-bugs on the wishlist.



         On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
         <r-help@r-project.org <mailto:r-help@r-project.org>> wrote:

             Hello List members,


             the following improvements would be useful for function
             cut (and .bincode):


             1.) Argument: Include extremes
             extremes = TRUE
             if(right == FALSE) {
                 # include also right for last interval;
             } else {
                 # include also left for first interval;
             }


             2.) Argument: warn = TRUE

             Warn if any values are not included in the intervals.


             Motivation:
             - reduce risk of errors when using function cut();


             Sincerely,


             Leonard

             ______________________________________________
             R-help@r-project.org <mailto:R-help@r-project.org>
             mailing list -- To UNSUBSCRIBE and more, see
             https://stat.ethz.ch/mailman/listinfo/r-help
             <https://stat.ethz.ch/mailman/listinfo/r-help>
             PLEASE do read the posting guide
             http://www.R-project.org/posting-guide.html
             <http://www.R-project.org/posting-guide.html>
             and provide commented, minimal, self-contained,
             reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to