On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys <jorism...@gmail.com> wrote: > To extwnd on Martin 's explanation : > > In factor(), levels are the unique input values and labels the unique output > values. So the function levels() actually displays the labels. >
Dear Joris I think we agree. Currently, factor insists both levels and labels be unique. I wish that it would not accept nonunique labels. I also understand it is impractical to change this now in base R. I don't think I succeeded in explaining why this would be nicer. Here's another example. Fairly often, we see input data like x <- c("Male", "Man", "male", "Man", "Female") The first four represent the same value. I'd like to go in one step to a new factor variable with enumerated types "Male" and "Female". This fails xf <- factor(x, levels = c("Male", "Man", "male", "Female"), labels = c("Male", "Male", "Male", "Female")) Instead, we need 2 steps. xf <- factor(x, levels = c("Male", "Man", "male", "Female")) levels(xf) <- c("Male", "Male", "Male", "Female") I think it is quirky that `levels<-.factor` allows the duplicated labels, whereas factor does not. I wrote a function rockchalk::combineLevels to simplify combining levels, but most of the students here like plyr::mapvalues to do it. The use of levels() can be tricky because one must enumerate all values, not just the ones being changed. But I do understand Martin's point. Its been this way 25 years, it won't change. :). > Cheers > Joris > > -- Paul E. Johnson http://pj.freefaculty.org Director, Center for Research Methods and Data Analysis http://crmda.ku.edu To write to me directly, please address me at pauljohn at ku.edu. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel