On Jan 30, 2012, at 12:15 PM, David Winsemius wrote: > > On Jan 30, 2012, at 8:44 AM, Paul Miller wrote: > >> Hi Rui, Marc, and Gabor, >> >> Thanks for your replies to my question. All were helpful and it was >> interesting to see how different people approach various aspects of the same >> problem. >> >> Spent some time this weekend looking at Rui's solution, which is certainly >> much clearer than my own. Managed to figure out pretty much all the details >> of how it works. Also managed to tweak it slightly in order to make it do >> exactly what I wanted. (See revised code below.) >> >> Still have a couple of questions though. The first concerns the insertion of >> the code "Y > 2012" to set year values beyond 2012 to NA (on line 10 of the >> function below). When I add this (or use it in place of "nchar(Y) > 4"), >> the code succesfully finds the problem date "05/16/2015". After that though, >> it produces the following error message: >> >> Error in if (any(is.na(x) & M != "un" & Y != "un")) cat("Warning: Invalid >> date values in", : missing value where TRUE/FALSE needed > > It's a bit dangerous to use comparison operators on mixed data types. In your > case you are comparing a character value to a numeric value and may not > realize that 2015 is not the same as "2015". Try "123" > 1000 if you want a > quick counter-example. You may want to coerce the Y value to "numeric" mode > to be safe. > > Also 'any' does not expect the logical connectives. You probably want: > > any(is.na(x) , M != "un" , Y != "un")
Perhaps I am missing something relevant here, but I am still confused by what I see as an over engineering of the code being implemented. If the primary requirements are: 1. Impute the 15th of month if it is 'un' 2. Reject dates prior to 1900 or after 2011 3. Reject dates with an unknown ('un') month or year 4. Reject years with >4 digits, also presuming that the value passed should always be 10 characters in length If that is the basic functionality required, then a modest modification of my prior code should work: checkDate <- function(x) { # Replace unknown day with 15 tmp <- gsub("/un/", "/15/", x) tmp2 <- as.Date(tmp, format = "%m/%d/%Y") as.character(x[is.na(tmp2) | tmp2 < as.Date("1900/01/01") | tmp2 > as.Date("2012/01/01") | nchar(as.character(x)) > 10]) } > TestDates Patient birthDT diagnosisDT metastaticDT 1 1 11/23/21931 05/23/2009 un/17/2011 2 2 06/20/1840 02/30/2010 03/17/2011 3 3 06/17/1935 12/20/2008 07/un/2011 4 4 05/31/1937 01/18/2007 04/30/2011 5 5 06/31/1933 05/16/2015 11/20/un > lapply(TestDates[, -1], checkDate) $birthDT [1] "11/23/21931" "06/20/1840" "06/31/1933" $diagnosisDT [1] "02/30/2010" "05/16/2015" $metastaticDT [1] "un/17/2011" "11/20/un" Does that not do what you require Paul? Marc > >> >> Why is this happening? If the code correctly correctly handles the date >> "06/20/1840" without producing an error, why can't it do likelwise with >> "05/16/2015"? >> >> The second question is why it's necessary to put "x" on line 15 following >> "cat("Warning ...)". I know that I don't get any date columns if I don't >> include this but am not sure why. >> >> The third question is whether it's possible to change the class of the date >> variables without using a for loop. I played around with this a little but >> didn't find a vectorized alternative. It may be that this is not really >> important. It's just that I've read in several places that for loops should >> be avoided wherever possible. >> >> Thanks, >> >> Paul <snip prior content> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.