[Rd] Suggestion on default 'levels' in 'factor'

Suharto Anggono Suharto Anggono via R-devel Fri, 06 May 2016 01:06:13 -0700

At first read, the logic of the following fragment in code of function 'factor' 
was not clear to me.
    if (missing(levels)) {
        y <- unique(x, nmax = nmax)
        ind <- sort.list(y) # or possibly order(x) which is more (too ?) 
tolerant
        y <- as.character(y)
        levels <- unique(y[ind])
    }


Code similar to the originally proposed in 
https://stat.ethz.ch/pipermail/r-devel/2009-May/053316.html is more readable to 
me.

I suggest using this.
    if (missing(levels))
        levels <- unique(as.character(
            sort.int(unique(x, nmax = nmax), na.last = TRUE)# or possibly 
sort(x) which is more (too ?) tolerant
            ))

I assume that as.character(y)[sort.list(y)] is equivalent to 
as.character(sort.int(y, na.last = TRUE)). So, what I suggest above has the 
same effect as code in current 'factor'.  Function 'sort.int' instead of 'sort' 
to be like 'sort.list' that fails for non-atomic input.

What I suggest is similar in form to default 'levels' in 'factor' in R before 
version 2.10.0, which is
sort(unique.default(x), na.last = TRUE)

If this suggestion is used, the help page for 'factor' can be changed to say 
"(by 'sort.int')" instead of "(by 'sort.list')".

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Suggestion on default 'levels' in 'factor'

Reply via email to