On Aug 27, 2012, at 5:08 PM, Mauricio Cornejo wrote:

Hi,

Would anyone have any idea as to why I would obtain completely different results when subsetting using the subset function vs bracket notation?

I have a data frame with 65 variables and 4382 rows. When I use execute the following subset command I get the correct results (125 rows)
subset(df, Renewal==TRUE, 1:2)


However, I tried to obtain the same results with bracket notation as follows. The output gave me all the rows in the data frame and not just the subset of 125 I was looking for.
df[df$Renewal==TRUE, 1:2]

The 'Renewal' variable is of logical type and is the last (65th) variable in the data frame. However, values are either TRUE or NA (there are no 'FALSE' values).

That's exactly it. If a logical index returns NA, its row is included in the output of "[" extraction. You can correct what I consider a failing and others consider a feature with:

df[df$Renewal==TRUE & !is.na(df$Renewal), 1:2]


My attempts at replicating this with a small dummy data set, for including here, have not worked (i.e. I don't get an error when I use synthetic data). Any ideas on what could be going on?

You _should_ get the predicted behavior. Perhaps your test case was flawed?

> dat <- data.frame(test1=1, Renewal=as.logical( sample(c(0,1,NA), 20, repl=TRUE)))
> dat[dat$Renewal==TRUE, ]
     test1 Renewal
NA      NA      NA
NA.1    NA      NA
3        1    TRUE
NA.2    NA      NA
NA.3    NA      NA
6        1    TRUE
7        1    TRUE
8        1    TRUE
NA.4    NA      NA
12       1    TRUE
NA.5    NA      NA
NA.6    NA      NA
16       1    TRUE
17       1    TRUE
NA.7    NA      NA
NA.8    NA      NA

This is all described in ?"["

--

David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to