>>>>> Paul Johnson <pauljoh...@gmail.com> >>>>> on Fri, 16 Jun 2017 11:02:34 -0500 writes:
> On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys <jorism...@gmail.com> wrote: >> To extwnd on Martin 's explanation : >> >> In factor(), levels are the unique input values and labels the unique output >> values. So the function levels() actually displays the labels. >> > Dear Joris > I think we agree. Currently, factor insists both levels and labels be unique. > I wish that it would not accept nonunique labels. I also understand it > is impractical to change this now in base R. > I don't think I succeeded in explaining why this would be nicer. > Here's another example. Fairly often, we see input data like > x <- c("Male", "Man", "male", "Man", "Female") > The first four represent the same value. I'd like to go in one step > to a new factor variable with enumerated types "Male" and "Female". > This fails > xf <- factor(x, levels = c("Male", "Man", "male", "Female"), > labels = c("Male", "Male", "Male", "Female")) > Instead, we need 2 steps. > xf <- factor(x, levels = c("Male", "Man", "male", "Female")) > levels(xf) <- c("Male", "Male", "Male", "Female") > I think it is quirky that `levels<-.factor` allows the duplicated > labels, whereas factor does not. > I wrote a function rockchalk::combineLevels to simplify combining > levels, but most of the students here like plyr::mapvalues to do it. > The use of levels() can be tricky because one must enumerate all > values, not just the ones being changed. > But I do understand Martin's point. Its been this way 25 years, it > won't change. :). Well.. the above is a bit out of context. Your first example really did not make a point to me (and Joris) and I showed that you could use even two different simple factor() calls to produce what you wanted yc <- factor(c("1",NA,NA,"4","4","4")) yn <- factor(c( 1, NA,NA, 4, 4, 4)) Your new example is indeed much more convincing ! (Note though that the two steps that are needed can be written more shortly The "been this way 25 years" is one a reason to be very cautious(*) with changes, but not a reason for no changes! (*) Indeed as some of you have noted we really should not "break behavior". This means to me we cannot accept a change there which gives an error or a different result in cases the old behavior gave a valid factor. I'm looking at a possible change currently [not promising that a change will happen ...] Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel