I have a dataset "mydf" with a field EMAIL_ADDRESS. When importing, I
specified:
mydf <- read.csv(file = extract, header = TRUE, stringsAsFactors = FALSE,
na.strings=c("NA",""))

I've also tried setting na.strings= c("NA","","<NA>") but I don't know if
it's appropriate to put <NA> there.

I'm running
a <- subset(mydf, VALID_EMAIL == FALSE, na.rm = TRUE, select =
EMAIL_ADDRESS)
dput(head(a,5))

structure(list(EMAIL_ADDRESS = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_)), .Names = "EMAIL_ADDRESS",
row.names = c(17L,
22L, 23L, 24L, 30L), class = "data.frame")

The results show a lot of <NA> values on screen and in the dput statement.

I don't quite understand why it is doing that. I would have expected it to
exclude those since I had the na.rm = TRUE statement. Do you have any
suggestions?

Thanks!
-- 
Jeff

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to