I recently became aware that using %in% for the Date class is about
100x slower from R 4.3 onward than in older versions.  I did not
include the results from R prior to 4.3 but the first and second
methods below yield equal and very fast results for older R versions.

I have suggested a fix that treats the date class in an identical
manner to POSIXct and POSIXlt via the mtfrm generic which is
ultimately called by %in%.  I only found one reference to this issue
(see 
https://stackoverflow.com/questions/77909868/why-is-match-slower-on-dates-datetimes-in-r-version-4-3-2-than-version-4-2-2).

I apologize if this should have been sent to r-h...@r-project.org or
if this issue has already been addressed.  Thanks.

------------------------------------------------------------------------------------------------------------
Rstudio session below, note that R --vanilla gives the same results
------------------------------------------------------------------------------------------------------------
> sessionInfo()$R.version$version.string    #
[1] "R version 4.5.1 (2025-06-13)"
>
> date_seq <- seq(as.Date("1705-01-01"), as.Date("2024-12-31"), by="days")
> dt1 <- as.Date("2024-05-01")
>
> # %in%
> tictoc::tic()
> tmp <- dt1 %in% date_seq
> tictoc::toc()
0.125 sec elapsed
>
> # cast to integer then %in% (gives fast results similar to old R without 
> casting to int)
> tictoc::tic()
> tmp <- as.integer(dt1) %in% as.integer(date_seq)
> tictoc::toc()
0.001 sec elapsed
>
> # Create an mtfrm method for Date class that is identical to POSIXct and 
> POSIXlt methods
> # This results in the expected dramatic speedup
> temp_fun <- function(x)
+   as.vector(x, "any")
>
> .S3method("mtfrm", "Date", temp_fun)
>
> # %in% with mtrfm method for Date
> tictoc::tic()
> tmp <- dt1 %in% date_seq
> tictoc::toc()
0.002 sec elapsed

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to