Re: [Rd] rank(, ties.method="last")
Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via R-devel: Marius Hofert-4-- Den 2015-10-09 kl. 12:14, skrev Martin Maechler: I think so: the code above doesn't seem to do the right thing. Consider the following example: > x <- c(1, 1, 2, 3) > rank2(x, ties.method = "last") [1] 1 2 4 3 That doesn't look right to me -- I had expected > rev(sort.list(x, decreasing = TRUE)) [1] 2 1 3 4 Indeed, well spotted, that seems to be correct. Henric Winell -- In the particular example (of length 4), what is really wanted is the following. ind <- integer(4) ind[sort.list(x, decreasing=TRUE)] <- 4:1 ind You don't provide the output here, but 'ind' is, of course, > ind [1] 2 1 3 4 The following gives the desired result: sort.list(rev(sort.list(x, decreasing=TRUE))) And, again, no output, but > sort.list(rev(sort.list(x, decreasing=TRUE))) [1] 2 1 3 4 Why is it necessary to use 'sort.list' on the result from 'rev(sort.list(...'? Henric Winell __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] rank(, ties.method="last")
> Henric Winell > on Wed, 21 Oct 2015 13:43:02 +0200 writes: > Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via R-devel: >> Marius Hofert-4-- >>> Den 2015-10-09 kl. 12:14, skrev Martin Maechler: >>> I think so: the code above doesn't seem to do the right thing. Consider >>> the following example: >>> >>> > x <- c(1, 1, 2, 3) >>> > rank2(x, ties.method = "last") >>> [1] 1 2 4 3 >>> >>> That doesn't look right to me -- I had expected >>> >>> > rev(sort.list(x, decreasing = TRUE)) >>> [1] 2 1 3 4 >>> >> >> Indeed, well spotted, that seems to be correct. >> >>> >>> Henric Winell >>> >> -- >> >> In the particular example (of length 4), what is really wanted is the following. >> ind <- integer(4) >> ind[sort.list(x, decreasing=TRUE)] <- 4:1 >> ind > You don't provide the output here, but 'ind' is, of course, >> ind > [1] 2 1 3 4 >> The following gives the desired result: >> sort.list(rev(sort.list(x, decreasing=TRUE))) > And, again, no output, but >> sort.list(rev(sort.list(x, decreasing=TRUE))) > [1] 2 1 3 4 > Why is it necessary to use 'sort.list' on the result from > 'rev(sort.list(...'? You can try all kind of code on this *too* simple example and do experiments. But let's approach this a bit more scientifically and hence systematically: Look at rank {the R function definition} to see that for the case of no NA's, rank(x, ties.method = "first') ===sort.list(sort.list(x)) If you assume that to be correct and want to define "last" to be correct as well (in the sense of being "first"-consistent), it is clear that rank(x, ties.method = "last) === rev(sort.list(sort.list(rev(x must also be correct. I don't think that *any* of the proposals so far had a correct version [but the too simplistic examples did not show the problems]. In R-devel (the R development) version of today, i.e., svn revision >= 69549, the implementation of ties.method = "last' uses ## == rev(sort.list(sort.list(rev(x : if(length(x) == 0) integer(0) else { i <- length(x):1L sort.list(sort.list(x[i]))[i] }, which is equivalent to using rev() but a bit more efficient. Martin Maechler, ETH Zurich __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Confusing print method for Inf dates
x <- as.Date(Inf, origin = "1970-01-01") x #> [1] NA str(x) #> Date[1:1], format: NA unclass(x) #> [1] Inf It's not clear what the correct behaviour is. The documentation for ?Date has: "It is intended that the date should be an integer,", which suggests that -Inf and Inf are not valid dates. But if that's true the behaviour for max.Date() needs some thought: max(as.Date(NA), na.rm = TRUE) #> Warning in max.default(structure(NA_real_, class = "Date"), na.rm = TRUE): #> no non-missing arguments to max; returning -Inf #> [1] NA If dates are integers, then there is no date that is smaller than all other dates, so it's not clear what max() should return - NA? Hadley -- http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Confusing print method for Inf dates
On Wed, Oct 21, 2015 at 4:57 PM, Hadley Wickham wrote: > x <- as.Date(Inf, origin = "1970-01-01") > x > #> [1] NA > str(x) > #> Date[1:1], format: NA > unclass(x) > #> [1] Inf > > It's not clear what the correct behaviour is. The documentation for > ?Date has: "It is intended that the date should be an integer,", which > suggests that -Inf and Inf are not valid dates. But if that's true the You omitted the second half of the sentence, which contains important information. The entire sentence is, "It is intended that the date should be an integer, but this is not enforced in the internal representation." Since it's not enforced internally, it doesn't necessarily follow that non-integer values are invalid dates. The rest of the paragraph describes how fractional (internal) dates can be created. Both ?format.Date and ?strptime say that 'NA' dates/times are printed as NA_character_. It might be clearer to say that "invalid" dates/times are printed as NA_character_. > behaviour for max.Date() needs some thought: > > max(as.Date(NA), na.rm = TRUE) > #> Warning in max.default(structure(NA_real_, class = "Date"), na.rm = TRUE): > #> no non-missing arguments to max; returning -Inf > #> [1] NA > > If dates are integers, then there is no date that is smaller than all > other dates, so it's not clear what max() should return - NA? > But they're not integers in the strict sense. > Hadley > > -- > http://had.co.nz/ > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] (no subject)
-- > Henric Winell <[hidden email]> > on Wed, 21 Oct 2015 13:43:02 +0200 writes: > Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via R-devel: >> Marius Hofert-4-- >>> Den 2015-10-09 kl. 12:14, skrev Martin Maechler: >>> I think so: the code above doesn't seem to do the right thing. Consider >>> the following example: >>> >>> > x <- c(1, 1, 2, 3) >>> > rank2(x, ties.method = "last") >>> [1] 1 2 4 3 >>> >>> That doesn't look right to me -- I had expected >>> >>> > rev(sort.list(x, decreasing = TRUE)) >>> [1] 2 1 3 4 >>> >> >> Indeed, well spotted, that seems to be correct. >> >>> >>> Henric Winell >>> >> -- >> >> In the particular example (of length 4), what is really wanted is the following. >> ind <- integer(4) >> ind[sort.list(x, decreasing=TRUE)] <- 4:1 >> ind > You don't provide the output here, but 'ind' is, of course, >> ind > [1] 2 1 3 4 >> The following gives the desired result: >> sort.list(rev(sort.list(x, decreasing=TRUE))) > And, again, no output, but >> sort.list(rev(sort.list(x, decreasing=TRUE))) > [1] 2 1 3 4 > Why is it necessary to use 'sort.list' on the result from > 'rev(sort.list(...'? You can try all kind of code on this *too* simple example and do experiments. But let's approach this a bit more scientifically and hence systematically: Look at rank {the R function definition} to see that for the case of no NA's, rank(x, ties.method = "first') ===sort.list(sort.list(x)) If you assume that to be correct and want to define "last" to be correct as well (in the sense of being "first"-consistent), it is clear that rank(x, ties.method = "last) === rev(sort.list(sort.list(rev(x must also be correct. I don't think that *any* of the proposals so far had a correct version [but the too simplistic examples did not show the problems]. In R-devel (the R development) version of today, i.e., svn revision >= 69549, the implementation of ties.method = "last' uses ## == rev(sort.list(sort.list(rev(x : if(length(x) == 0) integer(0) else { i <- length(x):1L sort.list(sort.list(x[i]))[i] }, which is equivalent to using rev() but a bit more efficient. Martin Maechler, ETH Zurich -- I'll defend that my code is correct in general. All comes from the fact that, if p is a permutation of 1:n, { ind <- integer(n); ind[p] <- 1:n; ind } gives the same result to sort.list(p) You can make sense of it like this. In ind[p] <- 1:n, ind[1] is the position where p == 1. So, ind[1] is the position of the smallest element of p. So, it is the first element of sort.list(p). Next elements follow. That's why 'sort.list' is used for ties.method="first" and ties.method="random" in function 'rank' in R. When p gives the desired order, { ind <- integer(n); ind[p] <- 1:n; ind } gives ranks of the original elements based on the order. The original element in position p[1] has rank 1, the original element in position p[2] has rank 2, and so on. Now, I say that rev(sort.list(x, decreasing=TRUE)) gives the desired order for ties.method="last". With the order, the elements are from smallest to largest; for equal elements, elements are ordered by their positions backwards. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel