As an example of the perils, Excel actually has two possible origins,
which you can select as an option but whose default differs by OS. And
one of them has in fact different origins for dates before and after
1900-02-28. From the help for as.Date in R-devel:
% http://support.microsoft.com/kb/214330
## Excel is said to use 1900-01-01 as day 1 (Windows default) or
## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel
## thinking 1900 was a leap year.
## So for recent dates from Windows Excel
as.Date(35981, origin="1899-12-30") # 1998-07-05
## and Mac Excel
as.Date(34519, origin="1904-01-01") # 1998-07-05
(the example is from the URL given).
This is the company whose C runtime declares datetimes before
1970-01-01 as invalid. But not those in UTC (some versions of glibc
misread the POSIX standard to say that), but datetimes before
1970-01-01 08:00:00 in UTC!
On Thu, 25 Mar 2010, Marc Schwartz wrote:
On Mar 25, 2010, at 5:41 PM, Joshua Wiley wrote:
Kind of off the thread a bit, but when I do:
as.Date(40182)
I ***do not*** get "2080-01-06". Instead I get an error:
Error in as.Date.numeric(40182) : 'origin' must be supplied
Am I the only user who gets picked on in this way, or does it
happen to others as well? The help on as.Date() clearly specifies
that "origin" must be supplied. So how come Anna got the result that
she did?
I also get that error. I believe there was a thread a few years back
discussing the merits of including a default origin, but to my
knowledge, it was never implemented, and there is no way to set a
default (e.g., through options()).
That is because what was implemented was already an informed choice.
That was actually just recently discussed again, although I think part of it
occurred offlist.
The reason is that the use of as.Date() in this context is intended to be used,
as is the case here, with dates coming from other applications that have been
converted back to a numeric offset from some origin, where it is likely, as is
the case here, that the origin will not be the same as R's.
Thus, there is a reasonable argument to be made to compel the user to know the
origin in use by the application from which the data was obtained. Otherwise,
if the user does not check their data, they will be in trouble.
If one has dates that have come from R and that have been coerced to numeric, for example
via data manipulation (eg. using aggregate(), etc.), one only need to re-set the class
back to "Date":
# from ?as.Date
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- as.Date(x, "%d%b%Y")
z
[1] "1960-01-01" "1960-01-02" "1960-03-31" "1960-07-30"
# same as unclass(z)
y <- as.numeric(z)
y
[1] -3653 -3652 -3563 -3442
#Note that R's origin is 1970-01-01
as.Date(y, origin = "1970-01-01")
[1] "1960-01-01" "1960-01-02" "1960-03-31" "1960-07-30"
However, all you really need is:
class(y) <- "Date"
y
[1] "1960-01-01" "1960-01-02" "1960-03-31" "1960-07-30"
HTH,
Marc Schwartz
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.