On 12-05-24 7:39 AM, Matthew Dowle wrote:
Duncan Murdoch<murdoch.duncan<at> gmail.com> writes:
On 12-05-23 4:37 AM, Matthew Dowle wrote:
Hi,
I've read ?is.unsorted and searched. Have found a few items but nothing
close, yet. Is the following expected?
is.unsorted(data.frame(1:2))
[1] FALSE
is.unsorted(data.frame(2:1))
[1] FALSE
is.unsorted(data.frame(1:2,3:4))
[1] TRUE
is.unsorted(data.frame(2:1,4:3))
[1] TRUE
IIUC, is.unsorted is intended for atomic vectors only (description of x in
?is.unsorted). Indeed the C source (src/main/sort.c) contains an error
message "only atomic vectors can be tested to be sorted". So that is the
error message I expected to see in all cases above, since I know that
data.frame is not an atomic vector. But there is also this in
?is.unsorted: "except for atomic vectors and objects with a class (where
the>= or> method is used)" which I don't understand. Where>= or> is
used by what, and where?
If you look at the source, you will see that the basic test for classed
objects is
all(x[-1L]>= x[-length(x)])
(in the function base:::.gtn).
This comparison doesn't really makes sense for dataframes, but it does
seem to be backwards: that tests that x[2]>= x[1], x[3]>= x[2], etc.,
returning TRUE if all comparisons are TRUE: but that sounds like it
should be is.sorted(), not is.unsorted(). Or is it my brain that is
backwards?
Thanks. Yes you're right. So is.unsorted() on a data.frame is trying to tell us
if there exists any unsorted row, it seems.
I would guess that it was never intended to be used this way. It is
intended for to test x[1] < x[2] < x[3] ... for objects where this is a
sensible calculation; it isn't really sensible for dataframes.
DF = data.frame(a=c(1,3,5),b=c(1,3,5))
DF
a b
1 1 1 # this row is sorted
2 3 3 # this row is sorted
3 5 5 # this row is sorted
is.unsorted(DF) # going by row but should be !.gtn
[1] TRUE
with(DF,is.unsorted(order(a,b))) # most people's natural expectation I guess
[1] FALSE
DF[2,2]=2
DF
a b
1 1 1 # this row is sorted
2 3 2 # this row isn't sorted
3 5 5 # this row is sorted
is.unsorted(DF) # going by row but should be !.gtn
[1] FALSE
with(DF,is.unsorted(order(a,b))) # most people's natural expectation I guess
[1] FALSE
Since it seems to have a bug anyway (and if so, can't be correct in anyone's
use of it), could either is.unsorted on a data.frame return the error that's in
the C code already: "only atomic vectors can be tested to be sorted", for
safety and to lessen confusion, or be changed to return the natural expectation
proposed above? The easiest quick fix would be to negate the result of the .gtn
call of course, but then you could never go back.
I don't follow the last sentence. If the .gtn call needs to be negated,
why would you want to go back?
Duncan Murdoch
Matthew
Duncan Murdoch
I understand why the first two are FALSE (1 item of anything must be
sorted). I don't understand the 3rd and 4th cases where length is 2:
do_isunsorted seems to call lang3(install(".gtn"), x, CADR(args))). Does
that fall back to TRUE for some reason?
Matthew
sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.8.0
loaded via a namespace (and not attached):
[1] tools_2.15.0
______________________________________________
R-devel<at> r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel