On Aug 27, 2012, at 5:08 PM, Mauricio Cornejo wrote:
Hi,
Would anyone have any idea as to why I would obtain completely
different results when subsetting using the subset function vs
bracket notation?
I have a data frame with 65 variables and 4382 rows. When I use
execute the following subset command I get the correct results (125
rows)
subset(df, Renewal==TRUE, 1:2)
However, I tried to obtain the same results with bracket notation as
follows. The output gave me all the rows in the data frame and not
just the subset of 125 I was looking for.
df[df$Renewal==TRUE, 1:2]
The 'Renewal' variable is of logical type and is the last (65th)
variable in the data frame. However, values are either TRUE or NA
(there are no 'FALSE' values).
That's exactly it. If a logical index returns NA, its row is included
in the output of "[" extraction. You can correct what I consider a
failing and others consider a feature with:
df[df$Renewal==TRUE & !is.na(df$Renewal), 1:2]
My attempts at replicating this with a small dummy data set, for
including here, have not worked (i.e. I don't get an error when I
use synthetic data). Any ideas on what could be going on?
You _should_ get the predicted behavior. Perhaps your test case was
flawed?
> dat <- data.frame(test1=1, Renewal=as.logical( sample(c(0,1,NA),
20, repl=TRUE)))
> dat[dat$Renewal==TRUE, ]
test1 Renewal
NA NA NA
NA.1 NA NA
3 1 TRUE
NA.2 NA NA
NA.3 NA NA
6 1 TRUE
7 1 TRUE
8 1 TRUE
NA.4 NA NA
12 1 TRUE
NA.5 NA NA
NA.6 NA NA
16 1 TRUE
17 1 TRUE
NA.7 NA NA
NA.8 NA NA
This is all described in ?"["
--
David Winsemius, MD
Alameda, CA, USA
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.