Re: [R] Improvement: function cut

2021-09-18 Thread David Winsemius
On 9/18/21 5:28 AM, Leonard Mada via R-help wrote: Hello Andrew, I add this info as a completion (so other users can get a better understanding): If we want to perform a survival analysis, than the interval should be closed to the right, but we should include also the first time point (as pe

Re: [R] Improvement: function cut

2021-09-18 Thread Leonard Mada via R-help
Hello Andrew, I add this info as a completion (so other users can get a better understanding): If we want to perform a survival analysis, than the interval should be closed to the right, but we should include also the first time point (as per Intention-to-Treat): [0, 4](4, 8](8, 12](12, 16]

Re: [R] Improvement: function cut

2021-09-17 Thread Bert Gunter
Perhaps you and Andrew should take this discussion off list... Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Sep 17, 2021 at 3:45 PM Leonard Mada via R-he

Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help
The warn should be in cut() => .bincode(). It should be generated whenever a real value (excludes NA or NAN or +/- Inf) is not included in any of the bins. If the user writes a script and doesn't want any warnings: he can select warn = FALSE. But otherwise it would be very helpful to catch

Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help
Why would you want to merge different factors? It makes no sense on real data. Even if some names are the same, the factors are not the same! The only real-data application that springs to mind is censoring (right or left, depending on the choice): but here we have both open and closed interv

Re: [R] Improvement: function cut

2021-09-17 Thread Jeff Newmiller
Re your objection that "the user has to suspect that some values were not included" applies equally to your proposed warn option. There are a lot of ways to introduce NAs... in real projects all analysts should be suspecting this problem. On September 17, 2021 3:01:35 PM PDT, Leonard Mada via R

Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help
Hello Andrew, But "cut" generates factors. In most cases with real data one expects to have also the ends of the interval: the argument "include.lowest" is both ugly and too long. [The test-code on the ftable thread contains this error! I have run through this error a couple of times.] The

Re: [R] Improvement: function cut

2021-09-17 Thread Andrew Simmons
I disagree, I don't really think it's too long or ugly, but if you think it is, you could abbreviate it as 'i'. x <- 0:20 breaks1 <- seq.int(0, 16, 4) breaks2 <- seq.int(0, 20, 4) data.frame( cut(x, breaks1, right = FALSE, i = TRUE), cut(x, breaks2, right = FALSE, i = TRUE), check.nam

Re: [R] Improvement: function cut

2021-09-17 Thread Andrew Simmons
While it is not explicitly mentioned anywhere in the documentation for .bincode, I suspect 'include.lowest = FALSE' is the default to keep the definitions of the bins consistent. For example: x <- 0:20 breaks1 <- seq.int(0, 16, 4) breaks2 <- seq.int(0, 20, 4) cbind( .bincode(x, breaks1, right

Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help
Thank you Andrew. Is there any reason not to make: include.lowest = TRUE the default? Regarding the NA: The user still has to suspect that some values were not included and run that test. Leonard On 9/18/2021 12:53 AM, Andrew Simmons wrote: > Regarding your first point, argument 'include.

Re: [R] Improvement: function cut

2021-09-17 Thread Andrew Simmons
Regarding your first point, argument 'include.lowest' already handles this specific case, see ?.bincode Your second point, maybe it could be helpful, but since both 'cut.default' and '.bincode' return NA if a value isn't within a bin, you could make something like this on your own. Might be worth

[R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help
Hello List members, the following improvements would be useful for function cut (and .bincode): 1.) Argument: Include extremes extremes = TRUE if(right == FALSE) {    # include also right for last interval; } else {    # include also left for first interval; } 2.) Argument: warn = TRUE Warn