On Jul 24, 2009, at 8:17 AM, Bryan Hanson wrote:

I find that after subsetting (you may prefer "conditional selection") a data frame and assigning it to a new object, the str(new object) reflects the
original data frame, not the new one:

A <- rnorm(20)
B <- factor(rep(c("t", "g"), 10))
C <- factor(rep(c("h", "l"), 10))
D <- data.frame(A, B, C)

str(D) # reports correctly

E <- D[D$C == "h",]

str(E) # reports that D$C still has 2 levels, but
E # or E$C shows that subsetting worked properly
Summary(E) # shows the original structure and that subsetting worked

Is this the expected behavior, and if so, is there a particular rationale? I would be pretty certain that the information about E was inherited from D,
but why wasn't it updated to reflect the revised object?  Is there an
argument that I can use to force the updating?

For better or worse, I use str() a lot to check my work, and in this case,
it seems to have misled me.

Thanks as always, Bryan

See ?"[.factor" which is the extract (subset) method for factors. Note that the 'drop' argument is FALSE by default. It is this argument that controls the retention of unused factor levels.

The reason that it is FALSE by default is to ensure that if you are comparing factors from more than one data source, the comparisons of or the use of the factor levels are consistent.

For one approach to dropping unused factor levels from a data frame, see:

  http://wiki.r-project.org/rwiki/doku.php?id=tips:data-manip:drop_unused_levels

HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to