Re: [R] Mystery Error in midnightStandard

Ted Byers Wed, 28 Jan 2009 08:42:24 -0800

Hi Yohan,

On Wed, Jan 28, 2009 at 10:28 AM, Yohan Chalabi <chal...@phys.ethz.ch>wrote:


> >>>> "TB" == Ted Byers <r.ted.by...@gmail.com>
> >>>> on Wed, 28 Jan 2009 09:30:58 -0500
>
>   TB> It is certain that all entries have the same format, but I'm
>   TB> starting to
>   TB> think that the error message is something of a red herring.
>   TB> Consider this:
>   TB>
>   TB> > year = 2009
>   TB> > week = 0
>   TB> > day = 3
>   TB> > datestr = sprintf(%i-%i-%i,year,week,day);datestr
>   TB> [1] 2009-0-3
>   TB> > date1 = timeDate(datestr, format = %Y-%U-%w);
>   TB> > date1
>   TB> GMT
>   TB> [1] [NA]
>   TB> > day = 4
>   TB> > datestr = sprintf(%i-%i-%i,year,week,day);datestr
>   TB> [1] 2009-0-4
>   TB> > date1 = timeDate(datestr, format = %Y-%U-%w);
>   TB> > date1
>   TB> GMT
>   TB> [1] [2009-01-01]
>   TB> >
>   TB> > datestr = sprintf(%i-%i-%i,year,week,3);datestr
>   TB> [1] 2009-0-3
>   TB> > date2 = timeDate(datestr, format = %Y-%U-%w);date2
>   TB> GMT
>   TB> [1] [NA]
>   TB> > difftimeDate(date2,date1, units = weeks)
>    TB> Error in midnightStandard(charvec, format) :
>   TB> 'charvec' has non-NA entries of different number of characters
>    TB> In addition: Warning messages:
>   TB> 1: In min(x) : no non-missing arguments to min; returning Inf
>   TB> 2: In max(x) : no non-missing arguments to max; returning -Inf
>   TB>
>   TB>
>   TB>
>   TB> The first values for year, week and day are the values on
>   TB> which my loop
>   TB> dies.  It returns 'NA' here.  It seems clear that it is
>   TB> returning NA because
>   TB> the date that data corresponds to is 2008-12-31.
>   TB>
>   TB> The error is being produced by difftimeDate rather than timeDate
>   TB> (as shown
>   TB> by the above session).  But that represents a flaw in the
>   TB> function design.
>
> This is not a flaw in timeDate. it behaves the same way as
> 'as.POSIXct'
>

That the two behave the same doesn't change the assessment that the design
is flawed.  That doesn't mean that the function is wrong.  It means only
that the behaviour can be made more useful.  For example, in SQL, if a given
calculation returns NULL, and the result is subsequently used in another
calculation, the result that returns is also NULL.  That is quite useful,
and admits algorithms that can react appropriately to NULLs when necessary.
That is arguably better than forcing the code to fail the moment a NULL is
used in a secondary calculation.  In C++, OTOH, one can catch the problem
earlier using, e.g., exceptions, again allowing the program to complete even
when problems arise for certain values or combinations thereof.

As a software engineer, I understand the issues involved in creating
libraries.  If I want to incorporate the functionality of a given standard
suite of functions (e.g. ANSI C standard library functions, or posix
functions), my first step would be to ensure I can duplicate how they
behave.  But I would not stop there.  There are, for example, serious design
flaws in many ANSI C functions that, ignored, introduce serious security
defects in applications that use them.  I would therefore refactor them to
eliminate the security defects.  If they can not be eliminated, I would
replace the function in question by a similar function that does not have
that security defect.

Posix is a useful, but old, standard, and I am merely suggesting that once
you have duplicated it, look beyond it to ways it can be improved upon.
There is more to the design of a function than whether or not it gives the
right result with good input.  There is how it behaves when there is a
problem with the inputs and whether or not you force the calling code to die
when a problem arises or you give the calling code a way to react to such
problems.  When I add functions to my own C++ or Java libraries, I normally
include more bad input data in the unit tests than good data (though the
latter is sufficient to ensure correct results are invariably obtained),
precisely so I can document how it behaves when there is a problem and give
coders who use it a variety of options to use to deal with them.


>
> strptime(datestr, format = "%Y-%U-%w")
>
> Instead of claiming that there is a flaw in the function you could have
> suggested an 'is.na' method for 'timeDate'.
>

At the time, I did not know about is.na.  I have spent the past hour trying
is.na, but to no avail.  I guess that is no surprise to you, but that it
would fail is not reflected in the R documentation of is.na.  That mentions
S3, but not S4.  As I just recently started using R, I have not yet looked
at what S3 and S4 are, so that is a few more hours of study before I get
this problem solved.


>
> I will add an 'is.na' method in the dev version of 'timeDate'.
>
>
Thanks.  I'll benefit from that once it makes it into the production
release.  In the mean time, I need to find a way to make something similar
now, in my script.

Thanks

Ted

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mystery Error in midnightStandard

Reply via email to