Re: [Rd] rank(, ties.method="last")

2015-10-21 Thread Henric Winell

Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via R-devel:


Marius Hofert-4--

Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
I think so: the code above doesn't seem to do the right thing.  Consider
the following example:

  > x <- c(1, 1, 2, 3)
  > rank2(x, ties.method = "last")
[1] 1 2 4 3

That doesn't look right to me -- I had expected

  > rev(sort.list(x, decreasing = TRUE))
[1] 2 1 3 4



Indeed, well spotted, that seems to be correct.



Henric Winell


--

In the particular example (of length 4), what is really wanted is the following.
ind <- integer(4)
ind[sort.list(x, decreasing=TRUE)] <- 4:1
ind


You don't provide the output here, but 'ind' is, of course,

> ind
[1] 2 1 3 4


The following gives the desired result:
sort.list(rev(sort.list(x, decreasing=TRUE)))


And, again, no output, but

> sort.list(rev(sort.list(x, decreasing=TRUE)))
[1] 2 1 3 4

Why is it necessary to use 'sort.list' on the result from 
'rev(sort.list(...'?



Henric Winell





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rank(, ties.method="last")

2015-10-21 Thread Martin Maechler
> Henric Winell 
> on Wed, 21 Oct 2015 13:43:02 +0200 writes:

> Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via 
R-devel:
>> Marius Hofert-4--
>>> Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
>>> I think so: the code above doesn't seem to do the right thing.  Consider
>>> the following example:
>>> 
>>> > x <- c(1, 1, 2, 3)
>>> > rank2(x, ties.method = "last")
>>> [1] 1 2 4 3
>>> 
>>> That doesn't look right to me -- I had expected
>>> 
>>> > rev(sort.list(x, decreasing = TRUE))
>>> [1] 2 1 3 4
>>> 
>> 
>> Indeed, well spotted, that seems to be correct.
>> 
>>> 
>>> Henric Winell
>>> 
>> --
>> 
>> In the particular example (of length 4), what is really wanted is the 
following.
>> ind <- integer(4)
>> ind[sort.list(x, decreasing=TRUE)] <- 4:1
>> ind

> You don't provide the output here, but 'ind' is, of course,

>> ind
> [1] 2 1 3 4

>> The following gives the desired result:
>> sort.list(rev(sort.list(x, decreasing=TRUE)))

> And, again, no output, but

>> sort.list(rev(sort.list(x, decreasing=TRUE)))
> [1] 2 1 3 4

> Why is it necessary to use 'sort.list' on the result from 
> 'rev(sort.list(...'?

You can try all kind of code on this *too* simple example and do
experiments.  But let's approach this a bit more scientifically
and hence systematically:

Look at  rank  {the R function definition} to see that
for the case of no NA's,

 rank(x, ties.method = "first')   ===sort.list(sort.list(x))

If you assume that to be correct and want to define "last" to be
correct as well (in the sense of being  "first"-consistent), 
it is clear that

  rank(x, ties.method = "last)   ===  rev(sort.list(sort.list(rev(x

must also be correct.  I don't think that *any* of the proposals
so far had a correct version [but the too simplistic examples
did not show the problems].

In  R-devel (the R development) version of today, i.e., svn
revision >= 69549, the implementation of  ties.method = "last'
uses
## == rev(sort.list(sort.list(rev(x :
if(length(x) == 0) integer(0)
else { i <- length(x):1L
   sort.list(sort.list(x[i]))[i] },

which is equivalent to using rev() but a bit more efficient.

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Confusing print method for Inf dates

2015-10-21 Thread Hadley Wickham
x <- as.Date(Inf, origin = "1970-01-01")
x
#> [1] NA
str(x)
#>  Date[1:1], format: NA
unclass(x)
#> [1] Inf

It's not clear what the correct behaviour is. The documentation for
?Date has: "It is intended that the date should be an integer,", which
suggests that -Inf and Inf are not valid dates. But if that's true the
behaviour for max.Date() needs some thought:

max(as.Date(NA), na.rm = TRUE)
#> Warning in max.default(structure(NA_real_, class = "Date"), na.rm = TRUE):
#> no non-missing arguments to max; returning -Inf
#> [1] NA

If dates are integers, then there is no date that is smaller than all
other dates, so it's not clear what max() should return - NA?

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Confusing print method for Inf dates

2015-10-21 Thread Joshua Ulrich
On Wed, Oct 21, 2015 at 4:57 PM, Hadley Wickham  wrote:
> x <- as.Date(Inf, origin = "1970-01-01")
> x
> #> [1] NA
> str(x)
> #>  Date[1:1], format: NA
> unclass(x)
> #> [1] Inf
>
> It's not clear what the correct behaviour is. The documentation for
> ?Date has: "It is intended that the date should be an integer,", which
> suggests that -Inf and Inf are not valid dates. But if that's true the

You omitted the second half of the sentence, which contains important
information.  The entire sentence is, "It is intended that the date
should be an integer, but this is not enforced in the internal
representation."  Since it's not enforced internally, it doesn't
necessarily follow that non-integer values are invalid dates.  The
rest of the paragraph describes how fractional (internal) dates can be
created.

Both ?format.Date and ?strptime say that 'NA' dates/times are printed
as NA_character_.  It might be clearer to say that "invalid"
dates/times are printed as NA_character_.

> behaviour for max.Date() needs some thought:
>
> max(as.Date(NA), na.rm = TRUE)
> #> Warning in max.default(structure(NA_real_, class = "Date"), na.rm = TRUE):
> #> no non-missing arguments to max; returning -Inf
> #> [1] NA
>
> If dates are integers, then there is no date that is smaller than all
> other dates, so it's not clear what max() should return - NA?
>
But they're not integers in the strict sense.

> Hadley
>
> --
> http://had.co.nz/
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] (no subject)

2015-10-21 Thread Suharto Anggono Suharto Anggono via R-devel
--
> Henric Winell <[hidden email]>
> on Wed, 21 Oct 2015 13:43:02 +0200 writes:

> Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via 
R-devel:
>> Marius Hofert-4--
>>> Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
>>> I think so: the code above doesn't seem to do the right thing.  Consider
>>> the following example:
>>>
>>> > x <- c(1, 1, 2, 3)
>>> > rank2(x, ties.method = "last")
>>> [1] 1 2 4 3
>>>
>>> That doesn't look right to me -- I had expected
>>>
>>> > rev(sort.list(x, decreasing = TRUE))
>>> [1] 2 1 3 4
>>>
>>
>> Indeed, well spotted, that seems to be correct.
>>
>>>
>>> Henric Winell
>>>
>> --
>>
>> In the particular example (of length 4), what is really wanted is the 
following.
>> ind <- integer(4)
>> ind[sort.list(x, decreasing=TRUE)] <- 4:1
>> ind

> You don't provide the output here, but 'ind' is, of course,

>> ind
> [1] 2 1 3 4

>> The following gives the desired result:
>> sort.list(rev(sort.list(x, decreasing=TRUE)))

> And, again, no output, but

>> sort.list(rev(sort.list(x, decreasing=TRUE)))
> [1] 2 1 3 4

> Why is it necessary to use 'sort.list' on the result from
> 'rev(sort.list(...'?

You can try all kind of code on this *too* simple example and do
experiments.  But let's approach this a bit more scientifically
and hence systematically:

Look at  rank  {the R function definition} to see that
for the case of no NA's,

 rank(x, ties.method = "first')   ===sort.list(sort.list(x))

If you assume that to be correct and want to define "last" to be
correct as well (in the sense of being  "first"-consistent),
it is clear that

  rank(x, ties.method = "last)   ===  rev(sort.list(sort.list(rev(x

must also be correct.  I don't think that *any* of the proposals
so far had a correct version [but the too simplistic examples
did not show the problems].

In  R-devel (the R development) version of today, i.e., svn
revision >= 69549, the implementation of  ties.method = "last'
uses
## == rev(sort.list(sort.list(rev(x :
if(length(x) == 0) integer(0)
else { i <- length(x):1L
   sort.list(sort.list(x[i]))[i] },

which is equivalent to using rev() but a bit more efficient.

Martin Maechler, ETH Zurich 
--

I'll defend that my code is correct in general.

All comes from the fact that, if p is a permutation of 1:n,
{ ind <- integer(n); ind[p] <- 1:n; ind }
gives the same result to
sort.list(p)
You can make sense of it like this. In ind[p] <- 1:n, ind[1] is the position 
where p == 1. So, ind[1] is the position of the smallest element of p. So, it 
is the first element of sort.list(p). Next elements follow.

That's why 'sort.list' is used for ties.method="first" and ties.method="random" 
in function 'rank' in R. When p gives the desired order,
{ ind <- integer(n); ind[p] <- 1:n; ind }
gives ranks of the original elements based on the order. The original element 
in position p[1] has rank 1, the original element in position p[2] has rank 2, 
and so on.

Now, I say that rev(sort.list(x, decreasing=TRUE)) gives the desired order for 
ties.method="last". With the order, the elements are from smallest to largest; 
for equal elements, elements are ordered by their positions backwards.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel