Hi Yohan, On Wed, Jan 28, 2009 at 10:28 AM, Yohan Chalabi <chal...@phys.ethz.ch>wrote:
> >>>> "TB" == Ted Byers <r.ted.by...@gmail.com> > >>>> on Wed, 28 Jan 2009 09:30:58 -0500 > > TB> It is certain that all entries have the same format, but I'm > TB> starting to > TB> think that the error message is something of a red herring. > TB> Consider this: > TB> > TB> > year = 2009 > TB> > week = 0 > TB> > day = 3 > TB> > datestr = sprintf(%i-%i-%i,year,week,day);datestr > TB> [1] 2009-0-3 > TB> > date1 = timeDate(datestr, format = %Y-%U-%w); > TB> > date1 > TB> GMT > TB> [1] [NA] > TB> > day = 4 > TB> > datestr = sprintf(%i-%i-%i,year,week,day);datestr > TB> [1] 2009-0-4 > TB> > date1 = timeDate(datestr, format = %Y-%U-%w); > TB> > date1 > TB> GMT > TB> [1] [2009-01-01] > TB> > > TB> > datestr = sprintf(%i-%i-%i,year,week,3);datestr > TB> [1] 2009-0-3 > TB> > date2 = timeDate(datestr, format = %Y-%U-%w);date2 > TB> GMT > TB> [1] [NA] > TB> > difftimeDate(date2,date1, units = weeks) > TB> Error in midnightStandard(charvec, format) : > TB> 'charvec' has non-NA entries of different number of characters > TB> In addition: Warning messages: > TB> 1: In min(x) : no non-missing arguments to min; returning Inf > TB> 2: In max(x) : no non-missing arguments to max; returning -Inf > TB> > TB> > TB> > TB> The first values for year, week and day are the values on > TB> which my loop > TB> dies. It returns 'NA' here. It seems clear that it is > TB> returning NA because > TB> the date that data corresponds to is 2008-12-31. > TB> > TB> The error is being produced by difftimeDate rather than timeDate > TB> (as shown > TB> by the above session). But that represents a flaw in the > TB> function design. > > This is not a flaw in timeDate. it behaves the same way as > 'as.POSIXct' > That the two behave the same doesn't change the assessment that the design is flawed. That doesn't mean that the function is wrong. It means only that the behaviour can be made more useful. For example, in SQL, if a given calculation returns NULL, and the result is subsequently used in another calculation, the result that returns is also NULL. That is quite useful, and admits algorithms that can react appropriately to NULLs when necessary. That is arguably better than forcing the code to fail the moment a NULL is used in a secondary calculation. In C++, OTOH, one can catch the problem earlier using, e.g., exceptions, again allowing the program to complete even when problems arise for certain values or combinations thereof. As a software engineer, I understand the issues involved in creating libraries. If I want to incorporate the functionality of a given standard suite of functions (e.g. ANSI C standard library functions, or posix functions), my first step would be to ensure I can duplicate how they behave. But I would not stop there. There are, for example, serious design flaws in many ANSI C functions that, ignored, introduce serious security defects in applications that use them. I would therefore refactor them to eliminate the security defects. If they can not be eliminated, I would replace the function in question by a similar function that does not have that security defect. Posix is a useful, but old, standard, and I am merely suggesting that once you have duplicated it, look beyond it to ways it can be improved upon. There is more to the design of a function than whether or not it gives the right result with good input. There is how it behaves when there is a problem with the inputs and whether or not you force the calling code to die when a problem arises or you give the calling code a way to react to such problems. When I add functions to my own C++ or Java libraries, I normally include more bad input data in the unit tests than good data (though the latter is sufficient to ensure correct results are invariably obtained), precisely so I can document how it behaves when there is a problem and give coders who use it a variety of options to use to deal with them. > > strptime(datestr, format = "%Y-%U-%w") > > Instead of claiming that there is a flaw in the function you could have > suggested an 'is.na' method for 'timeDate'. > At the time, I did not know about is.na. I have spent the past hour trying is.na, but to no avail. I guess that is no surprise to you, but that it would fail is not reflected in the R documentation of is.na. That mentions S3, but not S4. As I just recently started using R, I have not yet looked at what S3 and S4 are, so that is a few more hours of study before I get this problem solved. > > I will add an 'is.na' method in the dev version of 'timeDate'. > > Thanks. I'll benefit from that once it makes it into the production release. In the mean time, I need to find a way to make something similar now, in my script. Thanks Ted [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.