I have a dataset "mydf" with a field EMAIL_ADDRESS. When importing, I specified: mydf <- read.csv(file = extract, header = TRUE, stringsAsFactors = FALSE, na.strings=c("NA",""))
I've also tried setting na.strings= c("NA","","<NA>") but I don't know if it's appropriate to put <NA> there. I'm running a <- subset(mydf, VALID_EMAIL == FALSE, na.rm = TRUE, select = EMAIL_ADDRESS) dput(head(a,5)) structure(list(EMAIL_ADDRESS = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_)), .Names = "EMAIL_ADDRESS", row.names = c(17L, 22L, 23L, 24L, 30L), class = "data.frame") The results show a lot of <NA> values on screen and in the dput statement. I don't quite understand why it is doing that. I would have expected it to exclude those since I had the na.rm = TRUE statement. Do you have any suggestions? Thanks! -- Jeff [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.