Hi Martin,

On 04/24/2013 02:29 AM, Martin Maechler wrote:
Dear Herve,

Hervé Pagès <hpa...@fhcrc.org>
     on Tue, 23 Apr 2013 23:09:21 -0700 writes:

     > Hi, In the man page for is.unsorted():

     >    Value:

     >       A length-one logical value.  All objects of length 0
     > or 1 are sorted: the result will be ‘NA’ for objects of
     > length 2 or more except for atomic vectors and objects
     > with a class (where the ‘>=’ or ‘>’ method is used to
     > compare ‘x[i]’ with ‘x[i-1]’ for ‘i’ in ‘2:length(x)’).

     > This contains many incorrect statements:

     >> length(NA)
     >       [1] 1
     >> is.unsorted(NA)
     >       [1] NA
     >> length(list(NA))
     >       [1] 1
     >> is.unsorted(list(NA))
     >       [1] NA

     > => Contradicts "all objects of length 0 or 1 are sorted".

     >> is.unsorted(raw(2))
     >       Error in is.unsorted(raw(2)) : unimplemented type
     > 'raw' in 'isUnsorted'

     > => Doesn't agree with the doc (unless "except for atomic
     > vectors" means "it might fail for atomic vectors").

     >> setClass("A", representation(aa="integer")) a <- new("A",
     >> aa=4:1) length(a)
     >       [1] 1

     >> is.unsorted(a)
     >       [1] FALSE Warning message: In is.na(x) : is.na()
     > applied to non-(list or vector) of type 'S4'

     > => Ok, but it's arguable the warning is useful/justified
     > from a user point of view. The warning *seems* to suggest
     > that defining an "is.na" method for my objects is required
     > for is.unsorted() to work properly but the doc doesn't
     > make this clear.

     > Anyway, let's define one, so the warning goes away:

     >> setMethod("is.na", "A", function(x) is.na(x@aa))
     >       [1] "is.na"

     > Let's define a "length" method:

     >> setMethod("length", "A", function(x) length(x@aa))
     >       [1] "length"
     >> length(a)
     >       [1] 4

     >> is.unsorted(a)
     >       [1] FALSE

     > => Is this correct? Hard to know. The doc is not clear
     > about what should happen for objects of length 2 or more
     > and with a class but with no ">=" or ">" methods.

     > Let's define "[", ">=", and ">":

     >> setMethod("[", "A", function(x, i, j, ..., drop=TRUE)
     >> new("A",
     > aa=x@aa[i])) [1] "["
     >> rev(a)
     >       An object of class "A" Slot "aa": [1] 1 2 3 4

     >> setMethod(">=", c("A", "A"), function(e1, e2) {e1@aa >=
     >> e2@aa})
     >       [1] ">="
     >> a >= a[3]
     >       [1] TRUE TRUE TRUE FALSE

     >> setMethod(">", c("A", "A"), function(e1, e2) {e1@aa >
     >> e2@aa})
     >       [1] ">"
     >> a > a[3]
     >       [1] TRUE TRUE FALSE FALSE

     >> is.unsorted(a)
     >       [1] FALSE

     >> is.unsorted(rev(a))
     >      [1] FALSE

     > Still not working as expected. So what's required exactly
     > for making is.unsorted() work on an object "with a class"?

well, read the source code. :-) ;-)

More seriously: On another hidden help page, you find

   \code{.gt} and \code{.gtn} are callbacks from \code{\link{rank}} and
   \code{\link{is.unsorted}} used for classed objects.

In other words, you'd need do define a method for
  .gtn  for S4 objects in this case.

Ah, good to know.


.... yes, indeed I don't know why this is not at all documented.



     > BTW, is.unsorted() would be *much* faster, at least on
     > atomic vectors, without those calls to is.na().

Well, in all R versions, apart from R-devel as of yesterday,
the source of is.unsorted() has been

   is.unsorted <- function(x, na.rm = FALSE, strictly = FALSE)
   {
       if(is.null(x)) return(FALSE)
       if(!na.rm && any(is.na(x)))## "FIXME" is.na(<large>) is "too slow"
          return(NA)
       ## else
       if(na.rm && any(ii <- is.na(x)))
          x <- x[!ii]
       .Internal(is.unsorted(x, strictly))
   }

so you see the "FIXME".

In R-devel  (and probably  R-patched  in the nearer future),
that line is

       if(!na.rm && anyMissing(x))

so there's no slow code anymore, at least not for the default
case of  na.rm = FALSE.


     > The C code
     > could check for NAs, without having to do this as a first
     > pass on the full vector like it is the case with the
     > current implementation. If the vector if unsorted, the C
     > code is typically able to bail out early so the speed-up
     > will typically be 10000x or more if the vector as millions
     > of elements.

you are right (but again: the most important case na.rm=FALSE
     case has been "solved" already I'd say4),
but you know well that we do gratefully accept good patches to
the R sources.

Will do. Thanks!

H.



     > Thanks, H.

     >> sessionInfo()
     > R version 3.0.0 (2013-04-03) Platform:
     > x86_64-unknown-linux-gnu (64-bit)

     > locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3]
     > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5]
     > LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7]
     > LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11]
     > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

     > attached base packages: [1] stats graphics grDevices utils
     > datasets methods base

     > loaded via a namespace (and not attached): [1] tools_3.0.0

     > --
     > Hervé Pagès

     > Program in Computational Biology Division of Public Health
     > Sciences Fred Hutchinson Cancer Research Center 1100
     > Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
     > 98109-1024

     > E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax: (206)
     > 667-1319

     > ______________________________________________
     > R-devel@r-project.org mailing list
     > https://stat.ethz.ch/mailman/listinfo/r-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to