On Fri, Jun 23, 2017 at 2:20 PM, Uwe Ligges <lig...@statistik.tu-dortmund.de > wrote:
> > > > I had the chance to look at > 1300 SPSS files our consulting center > collected during the last 20 year, and in several hundred cases we found > such a problem that was copy & paste error and simply wrong. > Only in < 5 cases condensing several levels into one was appropriate, > hence we decided to keep duplicated levels by changing the names as the > default. > I understand where you're coming from. I know from personal experience exactly how much this is a pain in the ass, but I also have to group different labels in fewer categories in about every data set I get from clients or students. Especially when things come from surveys with 30 different education categories etc. So I would argue that checking for duplicate labels is a task for read.spss() and can be added as an extra check if necessary. But I personally don't see the fact that clients regularly mess up SPSS files as enough of an argument to not change the behaviour of factor(). > Based on this experience I'd propose no to touch factor but rather add a > function that easily allows for this reduction, if we do not have that > already. > There are functions already that allow to do this, like the tidyverse dplyr::recode_factor() function. It's rather trivial doing this with logical operators and indices, and I have my own "recode" function so I don't have to rely on any package or retype the same construct over and over again but with different values. But a clean and logical way to recode/group different levels when constructing the factor, would be at least for me be very convenient. But I'm just a guy and I'm not writing the code, so in the end it's up to you guys. Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 joris.m...@ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel