Comments interspersed below

From: Marie Sivertsen [mailto:mariesiv...@gmail.com] 
Sent: Thursday, January 22, 2009 1:17 PM
To: Greg Snow
Cc: r-h...@stat.math.ethz.ch
Subject: Re: [R] Unexpected behaviour of the as.Date (was: Error as.Date on 
Invalid Dates)

 [snip]


For your question, the help page for as.Date includes:

 "format: A character string.  The default is '"%Y-%m-%d"'.  For
         details see 'strftime'."


To be strict, neither "1/13/2001" nor "13/1/2001" match the format, so both 
should raise error, I think.  Since the behaviour seem not to apply the default 
strictly, why ought one think "13/1/2001" will not be parsed the only 
reasonable way?

 
The help page for as.Date refers to the help page for strptime which says that 
details are system specific. So there may be some systems where you would get 
an error from '/' not being '-', but apparently on your system they are treated 
the same.   Personally I see a big difference between interpreting an obvious 
separator as such and changing the order of values.  The fact that it sometimes 
gets the one correct does not imply to me that the other should happen 
automatically.  

Dealing with the separators can be done on an individual basis as each 
character string is processed.  Guessing the order of the entries could require 
looking at the entire vector/file/dataset, which I expect would slow things 
down quite a bit.  (and how long would it be before someone complained that it 
processed file A correctly, but file B should have been treated like A, but 
since it only included days less than 13, the program did not realize this).


And

"Character strings are processed as far as
    necessary for the format specified: any trailing characters are
    ignored."

I don't see anything in your examples that runs counter to the above.


Yes they do.  None of them match the format, but some parse correctly, some 
produce rubbish, and some raise error.  Maybe you want to improve the help page 
fo the as.Date to say something like "The default is a sequence of numerical 
representations of the year, then the month, then the day, separated by one of 
'-', '/', ...", which make it clearer.
But is it correct? It may be system dependent (or all systems may do the exact 
same now).  How about if the help page tells you to find out for your system 
(easy fix, it already does).

Remember that computers do exactly what you tell them to do, not what you think 
that they should do.


Computers do exactly what they were programmed to do, and what they will do 
depends on what the developer told them to do when they are given certain 
input.  I expect them to do exactly what I tell them to do, and it is to parse 
"1/13/2001" the only reasonable way.  It seems that someone told them to do 
something else...

I was using the general 'you' above that includes the programmer as well as the 
user, since you (singular) did not specify the format, the computer used the 
default format that the programmer (part of the collective 'you') specified 
which says the order is year, month, day.

Many problems come as a result of users forgetting that they are smarter than 
the computer.  I see 3 ways to remedy the problem:

1. Make computers that are as smart or smarter than people.
2. Make the programmers anticipate every way that someone may use a particular 
function and make them implement all of the functionality even if they don't 
think it is worth the time/effort since there is an easy work around for many 
of the less likely used features.
3. Don't expect the computer to guess correctly and tell it exactly what you 
want it to do.

I don't think that number 1 will ever happen, and there are plenty of science 
fiction stories that suggest problems with even trying.

Option 2 stinks of hubris, and even if it were possible, I personally would not 
want to wait until they were finished before being able to use the 
functions/programs.

Which leaves option 3, which I think is the best approach even without 
arguments against the others.

I think the moral of this story is: program defensively, always specify a date 
format! 


Mvh.
Marie



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to