On Jan 30, 2012, at 1:30 PM, Marc Schwartz wrote:

> 
> On Jan 30, 2012, at 12:15 PM, David Winsemius wrote:
> 
>> 
>> On Jan 30, 2012, at 8:44 AM, Paul Miller wrote:
>> 
>>> Hi Rui, Marc, and Gabor,
>>> 
>>> Thanks for your replies to my question. All were helpful and it was 
>>> interesting to see how different people approach various aspects of the 
>>> same problem.
>>> 
>>> Spent some time this weekend looking at Rui's solution, which is certainly 
>>> much clearer than my own. Managed to figure out pretty much all the details 
>>> of how it works. Also managed to tweak it slightly in order to make it do 
>>> exactly what I wanted. (See revised code below.)
>>> 
>>> Still have a couple of questions though. The first concerns the insertion 
>>> of the code "Y > 2012" to set year values beyond 2012 to NA (on line 10 of 
>>> the function below).  When I add this (or use it in place of "nchar(Y) > 
>>> 4"), the code succesfully finds the problem date "05/16/2015". After that 
>>> though, it produces the following error message:
>>> 
>>> Error in if (any(is.na(x) & M != "un" & Y != "un")) cat("Warning: Invalid 
>>> date values in",  :  missing value where TRUE/FALSE needed
>> 
>> It's a bit dangerous to use comparison operators on mixed data types. In 
>> your case you are comparing a character value to a numeric value and may not 
>> realize that 2015 is not the same as "2015". Try "123" > 1000 if you want a 
>> quick counter-example. You may want to coerce the Y value to "numeric" mode 
>> to be safe.
>> 
>> Also 'any' does not expect the logical connectives. You probably want:
>> 
>> any(is.na(x) , M != "un" , Y != "un")
> 
> 
> Perhaps I am missing something relevant here, but I am still confused by what 
> I see as an over engineering of the code being implemented. If the primary 
> requirements are:
> 
> 1. Impute the 15th of month if it is 'un'
> 2. Reject dates prior to 1900 or after 2011
> 3. Reject dates with an unknown ('un') month or year
> 4. Reject years with >4 digits, also presuming that the value passed should 
> always be 10 characters in length
> 
> If that is the basic functionality required, then a modest modification of my 
> prior code should work:


Ack...typo in my code for the upper end of the date range. Should be:

checkDate <- function(x) {

 # Replace unknown day with 15
 tmp <- gsub("/un/", "/15/", x)

 tmp2 <- as.Date(tmp, format = "%m/%d/%Y")

 as.character(x[is.na(tmp2) | 
              tmp2 < as.Date("1900/01/01") |
              tmp2 > as.Date("2011/12/31") |
              nchar(as.character(x)) > 10])
}


Marc

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to