Something that bit me:
The function relevel takes a factor, and a reference level to be promoted to
the first place.
If “ref” is a character this level is promoted, if it’s a numeric the “ref”-th
level is promoted.
Which turns out to be very confusing if you have factor with numeric values
(e.g. when reading in a csv with some dirty numeric columns and
stringsAsFactors TRUE)
For example:
set.seed(1)
test <- data.frame(n=sample(c(1:100, letters[1:10]), size=90))
test$n <- relevel(test$n, 50)
print(levels(test$n))
gives “62” as the first level.
Could we make something like this an error, or at least issue a warning?
Also because some other functions automatically coerce, factor(…, levels=1:100)
and levels(test$n) <- 1:100 works fine.
So this is maybe the most confusing: relevel(factor(1:10, levels = -10:20), 15)
gives “4” as the first level
For now I’ve thought of 2 possible implementations, that could be inserted in
stats::relevel.factor(), just before is.character(ref):
if(is.numeric(ref) && ref %in% lev)
warning('Provided numeric reference, note that this will promote the ',
ref, 'th value, not level with value "', ref, '"!')
or
if(is.numeric(ref) && any(!is.na(suppressWarnings(as.numeric(lev)))))
warning('Provided numeric reference, note that this will promote the ',
ref, 'th value, not level with value "', ref, '"!')
Best regards,
Emil Bode
Data-analyst
+31 6 43 83 89 33
[email protected]<mailto:[email protected]>
DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 |
[email protected]<mailto:[email protected]> |
dans.knaw.nl<applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
DANS is an institute of the Dutch Academy KNAW<http://knaw.nl/nl> and funding
organisation NWO<http://www.nwo.nl/>.
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel