I'm manipulating a large dataset and need to eliminate some observations based
on specific identifiers. This isn't a problem in and of itself (using which..
or subset..) but an imprint of the deleted observations seem to remain, even
though they have 0 observations. This is causing me problems later on. I'll
use the dataset warpbreaks to illustrate, I apologize if this isn't in the best
format
##Summary of warpbreaks suggests three tension levels (H, M, L)
> summary(warpbreaks)
breaks wool tension
Min. :10.00 A:27 L:18
1st Qu.:18.25 B:27 M:18
Median :26.00 H:18
Mean :28.15
3rd Qu.:34.00
Max. :70.00
## Subset the dataset and keep only those observations with "L"
> wb.subset <- warpbreaks[which(warpbreaks$tension=="L"),]
##Summary of the subsetted data shows: L=18, M=0, H=0, Why is M and H still
included?
> summary(wb.subset)
breaks wool tension
Min. :14.00 A:9 L:18
1st Qu.:26.00 B:9 M: 0
Median :29.50 H: 0
Mean :36.39
3rd Qu.:49.25
Max. :70.00
##The subsetted dataset does not show M or H
> wb.subset
Is there a way that M & H can be completely eliminated (i.e. they don't show up
in summary)? The only way I found was to export the dataset and then reimport,
which seems pretty cumbersome. Thanks in advance for any help. -Kirk
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.