Many thanks Martin!

I was completely overlooking the behaviour for a length 1 vector with 00:00:00. More coffee needed for me I think.

Best

Tim


On 15/08/2023 08:58, Martin Maechler wrote:
Tim Taylor
     on Mon, 14 Aug 2023 12:26:51 +0100 writes:
     > Martin,
     > Thank you. Everything you have written is helpful and I admit I am 
likely guilty of using as.character() instead of format() in the past().

     > Ignoring the above though, one thing I’m still unclear on is the special 
handling of zero (or rather non-zero time) seconds in the method. Is the 
motivation that as.character() outputs the minimum necessary information? It is 
clearly a very deliberate choice but the reasoning is still going a little over my 
head.

     > Best
     > Tim

Hmm, I really don't understand what you don't understand.
Here's some annotated R code exemplifying that indeed now,
     as.character(x)[j] === as.character(x[j])
but previously that was not fulfilled  {when  as.character() was
the same as format() for POSIXct or POSIXlt}:

##-----------------------------------------------------------------------------
x0 <- c("1975-01-01 00:00:00", "1975-01-01 15:27:00")
t0 <- as.POSIXct(x0)
str(t0) #  POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
t0    #  "1975-01-01 00:00:00 CET" "1975-01-01 15:27:00 CET"
t0[1] #  "1975-01-01 CET" <-- yes, *no* 00:00:00   in no version of R

## In R <= 4.2.x  as.character() was using format() for POSIX{ct,lt} :
as.character(t0)    # "1975-01-01 00:00:00" "1975-01-01 15:27:00" << for R <= 
4.2.x
as.character(t0)    # "1975-01-01"          "1975-01-01 15:27:00" << for R >= 
4.3.0
as.character(t0[1]) # "1975-01-01"  {in all versions of R}


Note that indeed   as.character()  does drop redundant trailing 0s :

   > as.character(c(0.5, 0.75, pi))
   [1] "0.5"              "0.75"             "3.14159265358979"

whereas format() does not (ensuring resulting strings of the same nchar(.)):

   > format(      c(0.5, 0.75, pi))
   [1] "0.500000" "0.750000" "3.141593"



     >> On 14 Aug 2023, at 09:52, Martin Maechler <maech...@stat.math.ethz.ch> 
wrote:
     >>
     >> 
     >>>
     >>>>>>> Andy Teucher
     >>>>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes:
     >>
     >>> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 
(https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I 
have come across a new behaviour and I wonder if it is unintended?
     >>
     >> Well, as the NEWS entry says
     >> (partly visible in the url above -- which only shows one part of
     >> the several changes for R 4.3) :
     >>
     >> • as.character(<POSIXt>) now behaves more in line with the methods
     >> for atomic vectors such as numbers, and is no longer influenced
     >> by options().  Ditto for as.character(<Date>).  The
     >> as.character() method gets arguments digits and OutDec with
     >> defaults _not_ depending on options().  Use of as.character(*,
     >> format = .) now warns.
     >>
     >> It was "inconsistent" to have  as.character(.) basically use format(.) 
for
     >> these datatime objects.
     >> as.character(x) for basic R types such as numbers, strings, logicals,...
     >> fulfills the important property
     >>
     >> as.character(x)[j] === as.character(x[j])
     >>
     >> whereas that is very much different for format() where indeed,
     >> the formatting  of  x[1]  may quite a bit depend on the other
     >> x[j]'s values:
     >>
     >>> as.character(c(1, pi, pi/2^20))
     >> [1] "1"    "3.14159265358979"   "2.99605622633914e-06"
     >>
     >>> format(c(1, pi, pi/2^20))
     >> [1] "1.000000e+00" "3.141593e+00" "2.996056e-06"
     >>> format(c(1, pi))
     >> [1] "1.000000" "3.141593"
     >>> format(c(1, 10))
     >> [1] " 1" "10"
     >>>
     >>
     >>
     >>> When calling `as.character.POSIXt()` on a vector that contains 
elements where the time component is midnight (00:00:00), it drops the time component of 
that element in the resulting character vector. Previously the time component was 
retained:
     >>
     >>> In R 4.2.3:
     >>
     >>> ```
     >>> R.version$version.string
     >>> #> [1] "R version 4.2.3 (2023-03-15)"
     >>
     >>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
     >>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST”
     >>
     >>> (tc <- as.character(t))
     >>> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00”
     >>> ```
     >>
     >>> In R 4.3.1:
     >>
     >>> ```
     >>> R.version$version.string
     >>> #> [1] "R version 4.3.1 (2023-06-16)"
     >>
     >>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
     >>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST”
     >>
     >>> (tc <- as.character(t))
     >>> #> [1] "1975-01-01" "1975-01-01 15:27:00”
     >>> ```
     >>
     >> You should have used format()  here  or at least should do so now.
     >>
     >>> This has consequences when round-tripping from POSIXt ->
     >>> character -> POSIXt,
     >>
     >> Well, I'd argue that such a "round trip" is not a "good idea"
     >> anyway, as there are quite a few platform (local timezone for
     >> one) issues, and precision is lost, notably for POSIXlt which
     >> may be more precise than you typically get, etc.
     >>
     >>> since `as.POSIXct.character()` drops the time component from the 
entire vector if any element does not have a time component:
     >>
     >> Well, there *is* no as.POSIXct.character()  {but we understand what you 
mean}:
     >> If you look at the help page you'd see that there's  
as.POSIXlt.character()
     >> {which is called from as.POSIXct.default()}
     >> with a 3rd argument 'format' and a 4th argument 'tryFormats'
     >> {and a lot more information -- the whole topic is far from trivial}.
     >>
     >> Now, indirectly you would want R to be "smart", i.e. the
     >> as.POSIXlt.character() method "guess better" about what the
     >> user wants. ...
     >> ... and I agree that is not an unreasonable expectation, e.g.,
     >> for your example of wanting
     >>
     >> c("1975-01-01", "1975-01-01 15:27:00")
     >>
     >> to  "work".
     >>
     >> as.POSIXlt.character() is well documented to be trying all of
     >> the `tryFormats` in order, until it finds one that works for all
     >> vector components (or fail / use NA if none works);
     >> and here it's only a format which drops the time that works for
     >> all (i.e. both, in the example).
     >>
     >> { Even though its behavior is well documented,
     >> one could even argue that by default you'd want a warning in
     >> such a case where "so much" is lost.
     >> I think however that introducing such a warning  may trip too
     >> much current code relying .. also, the extra *checking* maybe
     >> somewhat costly .. (?)  .... anyway that's an interesting side topic
     >> }
     >>
     >> Instead what you want here is for each string (element of the
     >> character vector) to try the `tryFormats and using the best
     >> available *individually*  {smart R users ==> "think lapply(.)"} :
     >> Currently, this would be  "something like"  unlist(lapply(x, 
as.POSIXlt))
     >> well, and then you need to jump a hoop additionally.
     >> If you want POSIXct,  like this :
     >>
     >> .POSIXct(unlist(lapply( * , as.POSIXct))))
     >>
     >> For your example
     >>
     >> ch <- c("1975-01-01", "1975-01-01 15:27:00")
     >>
     >>> str(.POSIXct(unlist(lapply(ch, as.POSIXct))))
     >> POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
     >>
     >> ---
     >>
     >> After all that, yes, I agree that we should consider making
     >> this much easier. E.g.,  by adding an optional argument to
     >> as.POSIXlt.character()   say, `each` with default FALSE such
     >> that as.POSIXlt(*,  each=TRUE)
     >> {and also as.POSIXct(*,  each=TRUE) } would follow the above
     >> strategy.
     >>
     >> ?
     >>
     >> Martin
     >>
     >> --
     >> Martin Maechler
     >> ETH Zurich   and   R Core tam
     >>
     >> ______________________________________________
     >> R-devel@r-project.org mailing list
     >> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to