Thank you Greg for your explanations. I think you explained the problem clearly now.
Mvh. Marie On Thu, Jan 22, 2009 at 10:24 PM, Greg Snow <greg.s...@imail.org> wrote: > Comments interspersed below > > From: Marie Sivertsen [mailto:mariesiv...@gmail.com] > Sent: Thursday, January 22, 2009 1:17 PM > To: Greg Snow > Cc: r-h...@stat.math.ethz.ch > Subject: Re: [R] Unexpected behaviour of the as.Date (was: Error as.Date on > Invalid Dates) > > [snip] > > > For your question, the help page for as.Date includes: > > "format: A character string. The default is '"%Y-%m-%d"'. For > details see 'strftime'." > > > To be strict, neither "1/13/2001" nor "13/1/2001" match the format, so both > should raise error, I think. Since the behaviour seem not to apply the > default strictly, why ought one think "13/1/2001" will not be parsed the > only reasonable way? > > > The help page for as.Date refers to the help page for strptime which says > that details are system specific. So there may be some systems where you > would get an error from '/' not being '-', but apparently on your system > they are treated the same. Personally I see a big difference between > interpreting an obvious separator as such and changing the order of values. > The fact that it sometimes gets the one correct does not imply to me that > the other should happen automatically. > > Dealing with the separators can be done on an individual basis as each > character string is processed. Guessing the order of the entries could > require looking at the entire vector/file/dataset, which I expect would slow > things down quite a bit. (and how long would it be before someone > complained that it processed file A correctly, but file B should have been > treated like A, but since it only included days less than 13, the program > did not realize this). > > > And > > "Character strings are processed as far as > necessary for the format specified: any trailing characters are > ignored." > > I don't see anything in your examples that runs counter to the above. > > > Yes they do. None of them match the format, but some parse correctly, some > produce rubbish, and some raise error. Maybe you want to improve the help > page fo the as.Date to say something like "The default is a sequence of > numerical representations of the year, then the month, then the day, > separated by one of '-', '/', ...", which make it clearer. > But is it correct? It may be system dependent (or all systems may do the > exact same now). How about if the help page tells you to find out for your > system (easy fix, it already does). > > Remember that computers do exactly what you tell them to do, not what you > think that they should do. > > > Computers do exactly what they were programmed to do, and what they will do > depends on what the developer told them to do when they are given certain > input. I expect them to do exactly what I tell them to do, and it is to > parse "1/13/2001" the only reasonable way. It seems that someone told them > to do something else... > > I was using the general 'you' above that includes the programmer as well as > the user, since you (singular) did not specify the format, the computer used > the default format that the programmer (part of the collective 'you') > specified which says the order is year, month, day. > > Many problems come as a result of users forgetting that they are smarter > than the computer. I see 3 ways to remedy the problem: > > 1. Make computers that are as smart or smarter than people. > 2. Make the programmers anticipate every way that someone may use a > particular function and make them implement all of the functionality even if > they don't think it is worth the time/effort since there is an easy work > around for many of the less likely used features. > 3. Don't expect the computer to guess correctly and tell it exactly what > you want it to do. > > I don't think that number 1 will ever happen, and there are plenty of > science fiction stories that suggest problems with even trying. > > Option 2 stinks of hubris, and even if it were possible, I personally would > not want to wait until they were finished before being able to use the > functions/programs. > > Which leaves option 3, which I think is the best approach even without > arguments against the others. > > I think the moral of this story is: program defensively, always specify a > date format! > > > Mvh. > Marie > > > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.