On Jan 30, 2012, at 1:30 PM, Marc Schwartz wrote: > > On Jan 30, 2012, at 12:15 PM, David Winsemius wrote: > >> >> On Jan 30, 2012, at 8:44 AM, Paul Miller wrote: >> >>> Hi Rui, Marc, and Gabor, >>> >>> Thanks for your replies to my question. All were helpful and it was >>> interesting to see how different people approach various aspects of the >>> same problem. >>> >>> Spent some time this weekend looking at Rui's solution, which is certainly >>> much clearer than my own. Managed to figure out pretty much all the details >>> of how it works. Also managed to tweak it slightly in order to make it do >>> exactly what I wanted. (See revised code below.) >>> >>> Still have a couple of questions though. The first concerns the insertion >>> of the code "Y > 2012" to set year values beyond 2012 to NA (on line 10 of >>> the function below). When I add this (or use it in place of "nchar(Y) > >>> 4"), the code succesfully finds the problem date "05/16/2015". After that >>> though, it produces the following error message: >>> >>> Error in if (any(is.na(x) & M != "un" & Y != "un")) cat("Warning: Invalid >>> date values in", : missing value where TRUE/FALSE needed >> >> It's a bit dangerous to use comparison operators on mixed data types. In >> your case you are comparing a character value to a numeric value and may not >> realize that 2015 is not the same as "2015". Try "123" > 1000 if you want a >> quick counter-example. You may want to coerce the Y value to "numeric" mode >> to be safe. >> >> Also 'any' does not expect the logical connectives. You probably want: >> >> any(is.na(x) , M != "un" , Y != "un") > > > Perhaps I am missing something relevant here, but I am still confused by what > I see as an over engineering of the code being implemented. If the primary > requirements are: > > 1. Impute the 15th of month if it is 'un' > 2. Reject dates prior to 1900 or after 2011 > 3. Reject dates with an unknown ('un') month or year > 4. Reject years with >4 digits, also presuming that the value passed should > always be 10 characters in length > > If that is the basic functionality required, then a modest modification of my > prior code should work:
Ack...typo in my code for the upper end of the date range. Should be: checkDate <- function(x) { # Replace unknown day with 15 tmp <- gsub("/un/", "/15/", x) tmp2 <- as.Date(tmp, format = "%m/%d/%Y") as.character(x[is.na(tmp2) | tmp2 < as.Date("1900/01/01") | tmp2 > as.Date("2011/12/31") | nchar(as.character(x)) > 10]) } Marc ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.