On Jul 24, 2009, at 8:17 AM, Bryan Hanson wrote:
I find that after subsetting (you may prefer "conditional
selection") a data
frame and assigning it to a new object, the str(new object) reflects
the
original data frame, not the new one:
A <- rnorm(20)
B <- factor(rep(c("t", "g"), 10))
C <- factor(rep(c("h", "l"), 10))
D <- data.frame(A, B, C)
str(D) # reports correctly
E <- D[D$C == "h",]
str(E) # reports that D$C still has 2 levels, but
E # or E$C shows that subsetting worked properly
Summary(E) # shows the original structure and that subsetting worked
Is this the expected behavior, and if so, is there a particular
rationale?
I would be pretty certain that the information about E was inherited
from D,
but why wasn't it updated to reflect the revised object? Is there an
argument that I can use to force the updating?
For better or worse, I use str() a lot to check my work, and in this
case,
it seems to have misled me.
Thanks as always, Bryan
See ?"[.factor" which is the extract (subset) method for factors. Note
that the 'drop' argument is FALSE by default. It is this argument that
controls the retention of unused factor levels.
The reason that it is FALSE by default is to ensure that if you are
comparing factors from more than one data source, the comparisons of
or the use of the factor levels are consistent.
For one approach to dropping unused factor levels from a data frame,
see:
http://wiki.r-project.org/rwiki/doku.php?id=tips:data-manip:drop_unused_levels
HTH,
Marc Schwartz
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.