Hi Kirk,

It's because tension is a factor with three levels, as you could see with
str(warpbreaks).

Factors are one of the mysteries of R that distinguish a novice from
an initiate.

Reading ?subset directs you to ?droplevels. Here's an example:

> summary(warpbreaks)
     breaks      wool   tension
 Min.   :10.00   A:27   L:18
 1st Qu.:18.25   B:27   M:18
 Median :26.00          H:18
 Mean   :28.15
 3rd Qu.:34.00
 Max.   :70.00
> str(warpbreaks)
'data.frame':    54 obs. of  3 variables:
 $ breaks : num  26 30 54 25 70 52 51 26 67 18 ...
 $ wool   : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
 $ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...
> ?subset
> wb.subset <- warpbreaks[which(warpbreaks$tension=="L"),]
> summary(wb.subset)
     breaks      wool  tension
 Min.   :14.00   A:9   L:18
 1st Qu.:26.00   B:9   M: 0
 Median :29.50         H: 0
 Mean   :36.39
 3rd Qu.:49.25
 Max.   :70.00
> wb.subset <- droplevels(wb.subset)
> summary(wb.subset)
     breaks      wool  tension
 Min.   :14.00   A:9   L:18
 1st Qu.:26.00   B:9
 Median :29.50
 Mean   :36.39
 3rd Qu.:49.25
 Max.   :70.00
>


Sarah

On Thu, Nov 29, 2012 at 11:32 AM, Stodola, Kirk <kstod...@illinois.edu> wrote:
> I'm manipulating a large dataset and need to eliminate some observations 
> based on specific identifiers.  This isn't a problem in and of itself (using 
> which.. or subset..) but an imprint of the deleted observations seem to 
> remain, even though they have 0 observations.  This is causing me problems 
> later on.  I'll use the dataset warpbreaks to illustrate, I apologize if this 
> isn't in the best format
>
> ##Summary of warpbreaks suggests three tension levels (H, M, L)
>> summary(warpbreaks)
>
>      breaks      wool   tension
>  Min.   :10.00   A:27   L:18
>  1st Qu.:18.25   B:27   M:18
>  Median :26.00          H:18
>  Mean   :28.15
>  3rd Qu.:34.00
>  Max.   :70.00
>
> ## Subset the dataset and keep only those observations with "L"
>> wb.subset <- warpbreaks[which(warpbreaks$tension=="L"),]
>
>
> ##Summary of the subsetted data shows: L=18, M=0, H=0, Why is M and H still 
> included?
>> summary(wb.subset)
>
>      breaks      wool  tension
>  Min.   :14.00   A:9   L:18
>  1st Qu.:26.00   B:9   M: 0
>  Median :29.50         H: 0
>  Mean   :36.39
>  3rd Qu.:49.25
>  Max.   :70.00
>
> ##The subsetted dataset does not show M or H
>> wb.subset
>
> Is there a way that M & H can be completely eliminated (i.e. they don't show 
> up in summary)? The only way I found was to export the dataset and then 
> reimport, which seems pretty cumbersome.  Thanks in advance for any help.  
> -Kirk
>

--
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to