> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Schwab,Wilhelm K > Sent: Thursday, February 25, 2010 3:51 PM > To: r-help@r-project.org > Subject: [R] Ordering categories on a boxplot - a serious trap?? > > Hello all, > > I think I probably did something stupid, and R's part was to > allow me to do it. My goal was to control the order of > factor levels appearing horizontally on a boxplot. Enter > search engines and perhaps some creative stupidity on my > part, and I came up with the following: > > v=read.table("factor-order.txt",header=TRUE); > levels(v$doseGroup) = c("L", "M", "H"); > boxplot(v$dose~v$doseGroup);
levels<- translated the current level labels into another language, it did not change the integer codes of the factor. If you want to reorder the levels call factor(..., levels=). E.g., > z <- factor(c("Small","Large","Medium","Small")) > str(z) Factor w/ 3 levels "Large","Medium",..: 3 1 2 3 > str(factor(z, levels=c("Small","Medium","Large"))) Factor w/ 3 levels "Small","Medium",..: 1 3 2 1 You can relabel them also by using the labels= argument to factor > str(factor(z, levels=c("Small","Medium","Large"), labels=c("S","M","L"))) Factor w/ 3 levels "S","M","L": 1 3 2 1 Calling levels<- changes nothing but the level labels: > zcopy <- z > levels(zcopy) <- c("Small","Medium","Large") > str(zcopy) Factor w/ 3 levels "Small","Medium",..: 3 1 2 3 levels<- is handy for low-level manipulations but not for general use. Even factor(,levels=) can be a bit dangerous: if a new level is misspelled it will silently add NA's to the data: > str(factor(z, levels=c("Smal", "Medium", "Large"))) Factor w/ 3 levels "Smal","Medium",..: NA 3 2 NA Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > > A good way to see the trap is to evaluate: > > v=read.table("factor-order.txt",header=TRUE); > par(mfrow=c(2,1)); > boxplot(v$dose~v$doseGroup); > levels(v$doseGroup) = c("L", "M", "H"); > boxplot(v$dose~v$doseGroup); > par(mfrow=c(1,1)); > > The above creates two plots, one correct with the factors in > an inconvient order, and one that is WRONG. In the latter, > the labels appear in the desired order, but the data does not > "move with them." I did not discover the problem until I > repeated the same type of plot with something that had a > known relationship with the levels, and the result was > clearly not correct. > > I *think* the problem is to assign to the return value of > levels(). How did I think to do that? I'm not really sure, > but please look at > > https://stat.ethz.ch/pipermail/r-help/2008-August/171884.html > > > Perhaps it does not say to do exactly what I did, but it sure > was easy to follow to the mistake, it appeared to do what I > wanted, and the consequences of the mistake are ugly. > Perhaps levels() should return something that is immutable?? > If I am looking at this correctly, levels() is an accident > waiting to happen. > > What should I have done? It seems: > > read data and order factor levels > v=read.table("factor-order.txt",header=TRUE); > group = factor(v$doseGroup,levels = c("L", "M", "H") ); > boxplot(v$dose~group); > > > One disappointment is that the above factor() call apparently > needs to be repeated for any subset of v - I'm still trying > to get my mind around that one. > > Can anyone confirm this? It strikes me as a trap that should > be addressed so that an error results rather than a garbage graph. > > Bill > > > --- > Wilhelm K. Schwab, Ph.D. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.