David Winsemius <dwinsemius <at> comcast.net> writes: > On Feb 8, 2013, at 10:55 AM, Edwin Isensee wrote: > > > I'm using the read.xls function from gdata package to read one Excel file, > > like the example below: > > > > library(gdata) > > my_file <- '/Users/Desktop/Project.xlsx' > > valores <- read.xls(my_file) > > > > The problem is: one of the columns at the Excel file holds date information > > like 1-Jan-13, 5-Jan-13, 25-Jan-13. > Actually it holds them as number of days and only displays them in > that format. > > At Excel these information are treated > > as dates. When I read the file into a dataframe the corresponding data > > frame column holds numeric information like 41275, 41279, 41299. How can I > > convert these numeric information into the original date information?
> The easiest way would be to create a format in Excel. yyyy-mm-dd > should work well. Otherwise you should read the documentation about > date encoding. You can take those values and add them to something > like: as.Date("1900-01-01"). I say "something like" because Excel > date calculations have always had a strange bug that MS refuses to > acknowledge or fix that may make the date one or two days more or > less. > > > as.Date("1900-01-01") +c( 41275, 41279, 41299) > > [1] "2013-01-03" "2013-01-07" "2013-01-27" > The HFWutils package, now archived, had a function that did this. I extracted just that function: below I also post some information about where (I think) the "Excel date bug" referred to above comes from -- an interesting historical story. If you are using dates before Feb 1900, watch out (and read below)! ## from http://cran.r-project.org/src/contrib/Archive/ ## HFWutils/HFWutils_0.9.2008.05.17.tar.gz excelDate2Date <- function(excelDate) { Date <- excelDate + as.Date("1900-01-01") - 2 ## FIXME: add "if >1900-Feb-28" switch? return(Date) } ## http://www.cpearson.com/excel/datetime.htm ## Dates ## The integer portion of the number, ddddd, represents the number of ## days since 1900-Jan-0. For example, the date 19-Jan-2000 is stored ## as 36,544, since 36,544 days have passed since 1900-Jan-0. The ## number 1 represents 1900-Jan-1. It should be noted that the number ## 0 does not represent 1899-Dec-31. It does not. If you use the ## MONTH function with the date 0, it will return January, not ## December. Moreover, the YEAR function will return 1900, not 1899. ## Actually, this number is one greater than the actual number of ## days. This is because Excel behaves as if the date 1900-Feb-29 ## existed. It did not. The year 1900 was not a leap year (the year ## 2000 is a leap year). In Excel, the day after 1900-Feb-28 is ## 1900-Feb-29. In reality, the day after 1900-Feb-28 was 1900-Mar-1. ## This is not a "bug". Indeed, it is by design. Excel works this ## way because it was truly a bug in Lotus 123. When Excel was ## introduced, 123 has nearly the entire market for spreadsheet ## software. Microsoft decided to continue Lotus' bug, in order to ## fully compatible. Users who switched from 123 to Excel would not ## have to make any changes to their data. As long as all your dates ## later than 1900-Mar-1, this should be of no concern. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.