[R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-09 Thread Alexandre Courtiol
Hi R pkg developers,

We are facing a datetime handling issue which manifests itself in a
package we are working on.

In context, we noticed that reading datetime info from an excel file
resulted in different data depending on the computer we used.

We are aware that timezone and regional settings are general sources
of troubles, but the code we are using was trying to circumvent this.
We went only as far as figuring out that the issue happens when
converting a POSIXlt into a POSIXct.

Please find below, a minimal reproducible example where `foo` is
converted to `bar` on two different computers.
`foo` is a POSIXlt with a defined time zone and upon conversion to a
POSIXct, despite using a set time zone, we end up with `bar` being
different on Linux and on a Windows machine.

We noticed that the difference emerges from the system call
`.Internal(as.POSIXct())` within `as.POSIXct.POSIXlt()`.
We also noticed that the internal function in R actually calls
getenv("TZ") within C, which is probably what explains where the
difference comes from.

Such a behaviour is probably expected and not a bug, but what would be
the strategy to convert a POSIXlt into a POSIXct that would not be
machine dependent?

We finally noticed that depending on the datetime used as a starting
point and on the time zone used when calling `as.POSIXct()`, we
sometimes have a difference between computers and sometimes not...
which adds to our puzzlement.

Many thanks.
Alex & Liam


``` r
## On Linux
foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
 class = c("POSIXlt", "POSIXt"), tzone = "UTC")

bar <- as.POSIXct(foo, tz = "Europe/Berlin")

bar
#> [1] "2021-10-01 01:00:00 CEST"

dput(bar)
#> structure(1633042800, class = c("POSIXct", "POSIXt"), tzone =
"Europe/Berlin")
```

``` r
## On Windows
foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
 class = c("POSIXlt", "POSIXt"), tzone = "UTC")

bar <- as.POSIXct(foo, tz = "Europe/Berlin")

bar
#> [1] "2021-10-01 CEST"

dput(bar)
structure(1633046400, class = c("POSIXct", "POSIXt"), tzone = "Europe/Berlin")
```

-- 
Alexandre Courtiol, www.datazoogang.de

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-09 Thread Simon Urbanek
Alexandre,

it's better to parse the timestamp in correct timezone:

> foo = as.POSIXlt("2021-10-01", "UTC")
> as.POSIXct(as.character(foo), "Europe/Berlin")
[1] "2021-10-01 CEST"

The issue stems from the fact that you are pretending like your timestamp is 
UTC (which it is not) while you want to interpret the same values in a 
different time zone. The DST flags varies depending on the day (due to DST 
being 0 or 1 depending on the date) and POSIXlt does not have that information 
since you only attached the time zone without updating it:

> str(unclass(as.POSIXlt(foo, "Europe/Berlin")))
List of 9
 $ sec  : num 0
 $ min  : int 0
 $ hour : int 0
 $ mday : int 1
 $ mon  : int 9
 $ year : int 121
 $ wday : int 5
 $ yday : int 273
 $ isdst: int 0
 - attr(*, "tzone")= chr "Europe/Berlin"

note that isdst is 0 from the UTC entry (which doesn't have DST) even though 
that date is actually DST in CEST. Compare that to the correctly parsed POSIXlt:

> str(unclass(as.POSIXlt(as.character(foo), "Europe/Berlin")))
List of 11
 $ sec   : num 0
 $ min   : int 0
 $ hour  : int 0
 $ mday  : int 1
 $ mon   : int 9
 $ year  : int 121
 $ wday  : int 5
 $ yday  : int 273
 $ isdst : int 1
 $ zone  : chr "CEST"
 $ gmtoff: int NA
 - attr(*, "tzone")= chr "Europe/Berlin"

where isdst is 1 since it is indeed the DST. The OS difference seems to be that 
Linux respects the isdst information from POSIXlt while Windows and macOS 
ignores it. This behavior is documented: 

 At all other times ‘isdst’ can be deduced from the
 first six values, but the behaviour if it is set incorrectly is
 platform-dependent.

You can re-set isdst to -1 to make sure R will try to determine it:

> foo$isdst = -1L
> as.POSIXct(foo, "Europe/Berlin")
[1] "2021-10-01 CEST"

So, generally, you cannot simply change the time zone in POSIXlt - don't 
pretend the time is in UTC if it's not, you have to re-parse or re-compute the 
timestamps for it to be reliable or else the DST flag will be wrong.

Cheers,
Simon


> On 10/10/2022, at 1:14 AM, Alexandre Courtiol  
> wrote:
> 
> Hi R pkg developers,
> 
> We are facing a datetime handling issue which manifests itself in a
> package we are working on.
> 
> In context, we noticed that reading datetime info from an excel file
> resulted in different data depending on the computer we used.
> 
> We are aware that timezone and regional settings are general sources
> of troubles, but the code we are using was trying to circumvent this.
> We went only as far as figuring out that the issue happens when
> converting a POSIXlt into a POSIXct.
> 
> Please find below, a minimal reproducible example where `foo` is
> converted to `bar` on two different computers.
> `foo` is a POSIXlt with a defined time zone and upon conversion to a
> POSIXct, despite using a set time zone, we end up with `bar` being
> different on Linux and on a Windows machine.
> 
> We noticed that the difference emerges from the system call
> `.Internal(as.POSIXct())` within `as.POSIXct.POSIXlt()`.
> We also noticed that the internal function in R actually calls
> getenv("TZ") within C, which is probably what explains where the
> difference comes from.
> 
> Such a behaviour is probably expected and not a bug, but what would be
> the strategy to convert a POSIXlt into a POSIXct that would not be
> machine dependent?
> 
> We finally noticed that depending on the datetime used as a starting
> point and on the time zone used when calling `as.POSIXct()`, we
> sometimes have a difference between computers and sometimes not...
> which adds to our puzzlement.
> 
> Many thanks.
> Alex & Liam
> 
> 
> ``` r
> ## On Linux
> foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
> 9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
> class = c("POSIXlt", "POSIXt"), tzone = "UTC")
> 
> bar <- as.POSIXct(foo, tz = "Europe/Berlin")
> 
> bar
> #> [1] "2021-10-01 01:00:00 CEST"
> 
> dput(bar)
> #> structure(1633042800, class = c("POSIXct", "POSIXt"), tzone =
> "Europe/Berlin")
> ```
> 
> ``` r
> ## On Windows
> foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
> 9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
> class = c("POSIXlt", "POSIXt"), tzone = "UTC")
> 
> bar <- as.POSIXct(foo, tz = "Europe/Berlin")
> 
> bar
> #> [1] "2021-10-01 CEST"
> 
> dput(bar)
> structure(1633046400, class = c("POSIXct", "POSIXt"), tzone = "Europe/Berlin")
> ```
> 
> -- 
> Alexandre Courtiol, www.datazoogang.de
> 
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> 

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-09 Thread Jeff Newmiller
... which is why tidyverse functions and Python datetime handling irk me so 
much.

Is tidyverse time handling intrinsically broken? They have a standard practice 
of reading time as UTC and then using force_tz to fix the "mistake". Same as 
Python.

On October 9, 2022 6:57:06 PM PDT, Simon Urbanek  
wrote:
>Alexandre,
>
>it's better to parse the timestamp in correct timezone:
>
>> foo = as.POSIXlt("2021-10-01", "UTC")
>> as.POSIXct(as.character(foo), "Europe/Berlin")
>[1] "2021-10-01 CEST"
>
>The issue stems from the fact that you are pretending like your timestamp is 
>UTC (which it is not) while you want to interpret the same values in a 
>different time zone. The DST flags varies depending on the day (due to DST 
>being 0 or 1 depending on the date) and POSIXlt does not have that information 
>since you only attached the time zone without updating it:
>
>> str(unclass(as.POSIXlt(foo, "Europe/Berlin")))
>List of 9
> $ sec  : num 0
> $ min  : int 0
> $ hour : int 0
> $ mday : int 1
> $ mon  : int 9
> $ year : int 121
> $ wday : int 5
> $ yday : int 273
> $ isdst: int 0
> - attr(*, "tzone")= chr "Europe/Berlin"
>
>note that isdst is 0 from the UTC entry (which doesn't have DST) even though 
>that date is actually DST in CEST. Compare that to the correctly parsed 
>POSIXlt:
>
>> str(unclass(as.POSIXlt(as.character(foo), "Europe/Berlin")))
>List of 11
> $ sec   : num 0
> $ min   : int 0
> $ hour  : int 0
> $ mday  : int 1
> $ mon   : int 9
> $ year  : int 121
> $ wday  : int 5
> $ yday  : int 273
> $ isdst : int 1
> $ zone  : chr "CEST"
> $ gmtoff: int NA
> - attr(*, "tzone")= chr "Europe/Berlin"
>
>where isdst is 1 since it is indeed the DST. The OS difference seems to be 
>that Linux respects the isdst information from POSIXlt while Windows and macOS 
>ignores it. This behavior is documented: 
>
> At all other times ‘isdst’ can be deduced from the
> first six values, but the behaviour if it is set incorrectly is
> platform-dependent.
>
>You can re-set isdst to -1 to make sure R will try to determine it:
>
>> foo$isdst = -1L
>> as.POSIXct(foo, "Europe/Berlin")
>[1] "2021-10-01 CEST"
>
>So, generally, you cannot simply change the time zone in POSIXlt - don't 
>pretend the time is in UTC if it's not, you have to re-parse or re-compute the 
>timestamps for it to be reliable or else the DST flag will be wrong.
>
>Cheers,
>Simon
>
>
>> On 10/10/2022, at 1:14 AM, Alexandre Courtiol  
>> wrote:
>> 
>> Hi R pkg developers,
>> 
>> We are facing a datetime handling issue which manifests itself in a
>> package we are working on.
>> 
>> In context, we noticed that reading datetime info from an excel file
>> resulted in different data depending on the computer we used.
>> 
>> We are aware that timezone and regional settings are general sources
>> of troubles, but the code we are using was trying to circumvent this.
>> We went only as far as figuring out that the issue happens when
>> converting a POSIXlt into a POSIXct.
>> 
>> Please find below, a minimal reproducible example where `foo` is
>> converted to `bar` on two different computers.
>> `foo` is a POSIXlt with a defined time zone and upon conversion to a
>> POSIXct, despite using a set time zone, we end up with `bar` being
>> different on Linux and on a Windows machine.
>> 
>> We noticed that the difference emerges from the system call
>> `.Internal(as.POSIXct())` within `as.POSIXct.POSIXlt()`.
>> We also noticed that the internal function in R actually calls
>> getenv("TZ") within C, which is probably what explains where the
>> difference comes from.
>> 
>> Such a behaviour is probably expected and not a bug, but what would be
>> the strategy to convert a POSIXlt into a POSIXct that would not be
>> machine dependent?
>> 
>> We finally noticed that depending on the datetime used as a starting
>> point and on the time zone used when calling `as.POSIXct()`, we
>> sometimes have a difference between computers and sometimes not...
>> which adds to our puzzlement.
>> 
>> Many thanks.
>> Alex & Liam
>> 
>> 
>> ``` r
>> ## On Linux
>> foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
>> 9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
>> class = c("POSIXlt", "POSIXt"), tzone = "UTC")
>> 
>> bar <- as.POSIXct(foo, tz = "Europe/Berlin")
>> 
>> bar
>> #> [1] "2021-10-01 01:00:00 CEST"
>> 
>> dput(bar)
>> #> structure(1633042800, class = c("POSIXct", "POSIXt"), tzone =
>> "Europe/Berlin")
>> ```
>> 
>> ``` r
>> ## On Windows
>> foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
>> 9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
>> class = c("POSIXlt", "POSIXt"), tzone = "UTC")
>> 
>> bar <- as.POSIXct(foo, tz = "Europe/Berlin")
>> 
>> bar
>> #> [1] "2021-10-01 CEST"
>> 
>> dput(bar)
>> structure(1633046400, class = c("POSIXct", "POSIXt"), tzone = 
>> "Europe/Berlin")
>> ```
>> 
>> -- 
>> Alexandre Courtiol, www.datazoogang.de
>> 
>> __
>> R