In R devel, function 'factor' has been changed, allowing and merging duplicated
'labels'.
Issue 1: Handling of specified 'labels' without duplicates is slower than
before.
Example:
x <- rep(1:26, 40000)
system.time(factor(x, levels=1:26, labels=letters))
Function 'factor' is already rather slow because of conversion to character.
Please don't add slowdown.
Issue 2: While default 'labels' is 'levels', not specifying 'labels' may be
different from specifying 'labels' to be the same as 'levels'.
Example 1:
as.integer(factor(c(NA,2,3), levels = c(2, NA), exclude = NULL))
is different from
as.integer(factor(c(NA,2,3), levels = c(2, NA), labels = c(2, NA), exclude =
NULL))
File reg-tests-1d.R indicates that 'factor' behavior with NA is slightly
changed, for the better. NA entry (because it is unmatched to 'levels' argument
or is in 'exclude') is absorbed into NA in "levels" attribute (comes from
'labels' argument), if any. The issue is that it happens only when 'labels' is
specified.
Function 'factor' could use match(xlevs, nlevs)[f]. It doesn't match NA to NA
level. When 'f' is long enough, longer than 'xlevs', it is faster than
match(xlevs[f], nlevs).
Example 2:
With
levs <- c("A","A") ,
factor(levs, levels=levs)
gives error, but
factor(levs, levels=levs, labels=levs)
doesn't.
Note: In theory, if function 'factor' merged duplicated 'labels' in all cases,
at least in
factor(c(sqrt(2)^2, 2)) ,
function 'factor' could do matching on original 'x' (without conversion to
character), as in R before version 2.10.0. If function 'factor' did it,
factor(c(sqrt(2)^2, 2), levels = c(sqrt(2)^2, 2), labels = c("sqrt(2)^2", "2"))
could take sqrt(2)^2 and 2 as distinct.
Another thing: Function 'factor' in R devel uses 'order' instead of 'sort.list'.
The case of as.factor(x) for
x <- as.data.frame(character(0))
in tests/isas-tests.Rout.save reveals that 'order' on data frame is strange.
x <- as.data.frame(character(0))
y <- unique(x)
length(y) # 1
length(order(y)) # 0
length(as.character(y)) # 1
order(y) is not as long as as.character(y).
Another example:
length(mtcars) # 11
length(order(mtcars)) # 352
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel