[R-pkg-devel] Identify Original Column Names of Model Matrix
Good day, I am developing a wrapper around xgboost which does not (yet - I see that it is on the developer's version 2.0 task list) support factor variable type. It requires input data to be in one-hot encoding, which is created by Matrix::sparse.model.matrix. For further analysis, such as variable importance, is there a way to identify which original feature each column of a sparse.model.matrix result was derived from? Using str(oneHotMatrix), I don't see any class slots nor tacked-on attributes which would confidently allow the identification of original column names of the expanded input data. Is there an alternative way to robustly identify the original variable names? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers
On Sun, Oct 9, 2022 at 9:31 PM Jeff Newmiller wrote: > > ... which is why tidyverse functions and Python datetime handling irk me so > much. > > Is tidyverse time handling intrinsically broken? They have a standard > practice of reading time as UTC and then using force_tz to fix the "mistake". > Same as Python. Can you point to any docs that lead you to this conclusion so we can get them fixed? I strongly encourage people to parse date-times in the correct time zone; this is why lubridate::ymd_hms() and friends have a tz argument. Hadley -- http://hadley.nz __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] Rd cross-references to Suggested package
Hi all, I'd like to link to a help page of a package in my package's Suggests. WRE, section 2.5 says, "Historically (before R version 4.1.0), links of the form \link[pkg]{foo} and \link[pkg:bar]{foo} used to be interpreted as links to files foo.html and bar.html in package pkg, respectively. For this reason, the HTML help system looks for file foo.html in package pkg if it does not find topic foo, and then searches for the topic in other installed packages. To test that links work both with both old and new systems, the pre-4.1.0 behaviour can be restored by setting the environment variable _R_HELP_LINKS_TO_TOPICS_=false. "Packages referred to by these ‘other forms’ should be declared in the DESCRIPTION file, in the ‘Depends’, ‘Im ports’, ‘Suggests’ or ‘Enhances’ fields." This seems to imply that it's possible... though I don't understand when I need to set _R_HELP_LINKS_TO_TOPICS_=false in order to test that the link is done correctly. I'm using \link[pkg]{foo} in R 4.2.1. I ran R CMD build/INSTALL/check with and without that env var set to false. Both times the suggested package was not installed on my library path, so I had to set _R_CHECK_FORCE_SUGGESTS_=false for R CMD check --as-cran. I didn't notice a difference in output from R CMD check. Both runs had: * checking Rd cross-references ... NOTE Package unavailable to check Rd xrefs: ‘timeSeries’ I'd appreciate any thoughts and/or pointers to other documentation. Best, Josh -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers
Liam, I think I have failed to convey my main point in the last e-mail - which was that you want to parse the date/time in the timezone that you care about so in your example that would be > foo <- as.Date(33874, origin = "1899-12-30") > foo [1] "1992-09-27" > as.POSIXlt(as.character(foo), "Europe/Berlin") [1] "1992-09-27 CEST" I was explicitly saying that you do NOT want to simply change the time zone on POSIXlt objects as that won't work for reasons I explained - see my last e-mail. Cheers, Simon > On 11/10/2022, at 6:31 AM, Liam Bailey wrote: > > Hi all, > > Thanks Simon for the detailed response, that helps us understand a lot better > what’s going on! However, with your response in mind, we still encounter some > behaviour that we did not expect. > > I’ve included another minimum reproducible example below to expand on the > situation. In this example, `foo` is a Date object that we generate from a > numeric input. Following your advice, `bar` is then a POSIXlt object where we > now explicitly define timezone using argument tz. However, even though we are > explicit about the timezone the POSIXlt that is generated is always in UTC. > This then leads to the issues outlined by Alexandre above, which we now > understand are caused by DST. > > ``` r > #Generate date from numeric > #Not possible to specify tz at this point > foo <- as.Date(33874, origin = "1899-12-30") > dput(foo) > #> structure(8305, class = "Date") > > #Convert to POSIXlt specifying UTC timezone > bar <- as.POSIXlt(foo, tz = "UTC") > dput(bar) > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, > #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = c("POSIXlt", > #> "POSIXt"), tzone = "UTC") > > #Convert to POSIXlt specifying Europe/Berlin. > #Time zone is still UTC > bar <- as.POSIXlt(foo, tz = "Europe/Berlin") > dput(bar) > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, > #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = c("POSIXlt", > #> "POSIXt"), tzone = "UTC") > ``` > > > We noticed that this occurs because the tz argument is not passed to > `.Internal(Date2POSIXlt())` inside `as.POSIXlt.Date()`. > > Reading through the documentation for `as.POSIX*` we can see that this > behaviour is described: > > > “Dates without times are treated as being at midnight UTC.” > > In this case, if we want to convert a Date object to POSIX* and specify a > (non-UTC) timezone would the best strategy be to first coerce our Date object > to character? Alternatively, `lubridate::as_datetime()` does seem to > recognise the tz argument and convert a Date object to POSIX* with non-UTC > time zone (see second example below). But it would be nice to know if there > are subtle differences between these two approaches that we should be aware > of. > > ``` r > foo <- as.Date(33874, origin = "1899-12-30") > dput(foo) > #> structure(8305, class = "Date") > > #Convert to POSIXct specifying UTC timezone > bar <- lubridate::as_datetime(foo, tz = "UTC") > dput(as.POSIXlt(bar)) > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, > #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = c("POSIXlt", > #> "POSIXt"), tzone = "UTC") > > #Convert to POSIXct specifying Europe/Berlin > bar <- lubridate::as_datetime(foo, tz = "Europe/Berlin") > dput(as.POSIXlt(bar)) > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, > #> year = 92L, wday = 0L, yday = 270L, isdst = 1L, zone = "CEST", > #> gmtoff = 7200L), class = c("POSIXlt", "POSIXt"), tzone = > c("Europe/Berlin", > #> "CET", "CEST")) > ``` > > Thanks again for all your help. > Alex & Liam > >> On 10 Oct 2022, at 6:40 pm, Hadley Wickham wrote: >> >> On Sun, Oct 9, 2022 at 9:31 PM Jeff Newmiller >> wrote: >>> >>> ... which is why tidyverse functions and Python datetime handling irk me so >>> much. >>> >>> Is tidyverse time handling intrinsically broken? They have a standard >>> practice of reading time as UTC and then using force_tz to fix the >>> "mistake". Same as Python. >> >> Can you point to any docs that lead you to this conclusion so we can >> get them fixed? I strongly encourage people to parse date-times in the >> correct time zone; this is why lubridate::ymd_hms() and friends have a >> tz argument. >> >> Hadley >> >> -- >> http://hadley.nz > __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers
Hi Simon, Thanks for the clarification. >From a naive developer point of view, we were initially baffled that the generic as.POSIXlt() does very different things on a character and on a Date input: as.POSIXlt(as.character(foo), "Europe/Berlin") [1] "1992-09-27 CEST" as.POSIXlt(foo, "Europe/Berlin") [1] "1992-09-27 UTC" Based on what you said, it does make sense: it is only when creating the date/time that we want to include the time zone and that only happens when we don't already work on a previously created date. That is your subtle but spot-on distinction between "parsing" and "changing" the time zone. Yet, we do find it dangerous that as.POSIXlt.Date() accepts a time zone but does nothing of it, especially when the help file starts with: Usage as.POSIXlt(x, tz = "", ...) The behaviour is documented, as Liam reported it, but still, we will almost certainly not be the last one tripping on this (without even adding the additional issue of as.POSIXct() behaving differently across OS). Thanks again, Alex & Liam On Mon, 10 Oct 2022 at 22:13, Simon Urbanek wrote: > Liam, > > I think I have failed to convey my main point in the last e-mail - which > was that you want to parse the date/time in the timezone that you care > about so in your example that would be > > > foo <- as.Date(33874, origin = "1899-12-30") > > foo > [1] "1992-09-27" > > as.POSIXlt(as.character(foo), "Europe/Berlin") > [1] "1992-09-27 CEST" > > I was explicitly saying that you do NOT want to simply change the time > zone on POSIXlt objects as that won't work for reasons I explained - see my > last e-mail. > > Cheers, > Simon > > > > On 11/10/2022, at 6:31 AM, Liam Bailey > wrote: > > > > Hi all, > > > > Thanks Simon for the detailed response, that helps us understand a lot > better what’s going on! However, with your response in mind, we still > encounter some behaviour that we did not expect. > > > > I’ve included another minimum reproducible example below to expand on > the situation. In this example, `foo` is a Date object that we generate > from a numeric input. Following your advice, `bar` is then a POSIXlt object > where we now explicitly define timezone using argument tz. However, even > though we are explicit about the timezone the POSIXlt that is generated is > always in UTC. This then leads to the issues outlined by Alexandre above, > which we now understand are caused by DST. > > > > ``` r > > #Generate date from numeric > > #Not possible to specify tz at this point > > foo <- as.Date(33874, origin = "1899-12-30") > > dput(foo) > > #> structure(8305, class = "Date") > > > > #Convert to POSIXlt specifying UTC timezone > > bar <- as.POSIXlt(foo, tz = "UTC") > > dput(bar) > > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, > > #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = > c("POSIXlt", > > #> "POSIXt"), tzone = "UTC") > > > > #Convert to POSIXlt specifying Europe/Berlin. > > #Time zone is still UTC > > bar <- as.POSIXlt(foo, tz = "Europe/Berlin") > > dput(bar) > > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, > > #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = > c("POSIXlt", > > #> "POSIXt"), tzone = "UTC") > > ``` > > > > > > We noticed that this occurs because the tz argument is not passed to > `.Internal(Date2POSIXlt())` inside `as.POSIXlt.Date()`. > > > > Reading through the documentation for `as.POSIX*` we can see that this > behaviour is described: > > > > > “Dates without times are treated as being at midnight UTC.” > > > > In this case, if we want to convert a Date object to POSIX* and specify > a (non-UTC) timezone would the best strategy be to first coerce our Date > object to character? Alternatively, `lubridate::as_datetime()` does seem to > recognise the tz argument and convert a Date object to POSIX* with non-UTC > time zone (see second example below). But it would be nice to know if there > are subtle differences between these two approaches that we should be aware > of. > > > > ``` r > > foo <- as.Date(33874, origin = "1899-12-30") > > dput(foo) > > #> structure(8305, class = "Date") > > > > #Convert to POSIXct specifying UTC timezone > > bar <- lubridate::as_datetime(foo, tz = "UTC") > > dput(as.POSIXlt(bar)) > > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, > > #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = > c("POSIXlt", > > #> "POSIXt"), tzone = "UTC") > > > > #Convert to POSIXct specifying Europe/Berlin > > bar <- lubridate::as_datetime(foo, tz = "Europe/Berlin") > > dput(as.POSIXlt(bar)) > > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, > > #> year = 92L, wday = 0L, yday = 270L, isdst = 1L, zone = "CEST", > > #> gmtoff = 7200L), class = c("POSIXlt", "POSIXt"), tzone = > c("Europe/Berlin", > > #> "CET", "CEST")) > > ``` > > > > Thanks again for all your help. > > Alex & Liam > > > >> On 10 Oct 2022,
Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers
Right now as.POSIXlt.Date() is just function (x, ...) .Internal(Date2POSIXlt(x)) How expensive would it be to throw a warning when '...' is provided by the user/discarded ?? Alternately, perhaps the documentation could be amended, although I'm not quite sure what to suggest. (The sentence Liam refers to, "Dates without times are treated as being at midnight UTC." is correct but terse ...) On 2022-10-10 4:50 p.m., Alexandre Courtiol wrote: Hi Simon, Thanks for the clarification. From a naive developer point of view, we were initially baffled that the generic as.POSIXlt() does very different things on a character and on a Date input: as.POSIXlt(as.character(foo), "Europe/Berlin") [1] "1992-09-27 CEST" as.POSIXlt(foo, "Europe/Berlin") [1] "1992-09-27 UTC" Based on what you said, it does make sense: it is only when creating the date/time that we want to include the time zone and that only happens when we don't already work on a previously created date. That is your subtle but spot-on distinction between "parsing" and "changing" the time zone. Yet, we do find it dangerous that as.POSIXlt.Date() accepts a time zone but does nothing of it, especially when the help file starts with: Usage as.POSIXlt(x, tz = "", ...) The behaviour is documented, as Liam reported it, but still, we will almost certainly not be the last one tripping on this (without even adding the additional issue of as.POSIXct() behaving differently across OS). Thanks again, Alex & Liam On Mon, 10 Oct 2022 at 22:13, Simon Urbanek wrote: Liam, I think I have failed to convey my main point in the last e-mail - which was that you want to parse the date/time in the timezone that you care about so in your example that would be foo <- as.Date(33874, origin = "1899-12-30") foo [1] "1992-09-27" as.POSIXlt(as.character(foo), "Europe/Berlin") [1] "1992-09-27 CEST" I was explicitly saying that you do NOT want to simply change the time zone on POSIXlt objects as that won't work for reasons I explained - see my last e-mail. Cheers, Simon On 11/10/2022, at 6:31 AM, Liam Bailey wrote: Hi all, Thanks Simon for the detailed response, that helps us understand a lot better what’s going on! However, with your response in mind, we still encounter some behaviour that we did not expect. I’ve included another minimum reproducible example below to expand on the situation. In this example, `foo` is a Date object that we generate from a numeric input. Following your advice, `bar` is then a POSIXlt object where we now explicitly define timezone using argument tz. However, even though we are explicit about the timezone the POSIXlt that is generated is always in UTC. This then leads to the issues outlined by Alexandre above, which we now understand are caused by DST. ``` r #Generate date from numeric #Not possible to specify tz at this point foo <- as.Date(33874, origin = "1899-12-30") dput(foo) #> structure(8305, class = "Date") #Convert to POSIXlt specifying UTC timezone bar <- as.POSIXlt(foo, tz = "UTC") dput(bar) #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = c("POSIXlt", #> "POSIXt"), tzone = "UTC") #Convert to POSIXlt specifying Europe/Berlin. #Time zone is still UTC bar <- as.POSIXlt(foo, tz = "Europe/Berlin") dput(bar) #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = c("POSIXlt", #> "POSIXt"), tzone = "UTC") ``` We noticed that this occurs because the tz argument is not passed to `.Internal(Date2POSIXlt())` inside `as.POSIXlt.Date()`. Reading through the documentation for `as.POSIX*` we can see that this behaviour is described: > “Dates without times are treated as being at midnight UTC.” In this case, if we want to convert a Date object to POSIX* and specify a (non-UTC) timezone would the best strategy be to first coerce our Date object to character? Alternatively, `lubridate::as_datetime()` does seem to recognise the tz argument and convert a Date object to POSIX* with non-UTC time zone (see second example below). But it would be nice to know if there are subtle differences between these two approaches that we should be aware of. ``` r foo <- as.Date(33874, origin = "1899-12-30") dput(foo) #> structure(8305, class = "Date") #Convert to POSIXct specifying UTC timezone bar <- lubridate::as_datetime(foo, tz = "UTC") dput(as.POSIXlt(bar)) #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = c("POSIXlt", #> "POSIXt"), tzone = "UTC") #Convert to POSIXct specifying Europe/Berlin bar <- lubridate::as_datetime(foo, tz = "Europe/Berlin") dput(as.POSIXlt(bar)) #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, #> year = 92L, wday = 0L, yday = 270L, isdst = 1L, zone = "CEST", #>
[R-pkg-devel] Best way forward on a CRAN archived package
Hi: I have some doubts on how to proceed in this case. I am the developer of tidyterra, and I received an email from CRAN on 23Sep2022 about an issue on the package, setting a deadline on 07Oct2022 to correct it. I sent a patch that was accepted on CRAN on 29Sep2022, that fixed the issue (or at least I am pretty sure I solved it). I received no further feedback by CRAN, so I assumed the package was safe. However it was finally archived on 07Oct2022. I have already sent an email to CRAN in order to check if they think the issues still persist (or maybe they missed the patch submission?), but I am in a rush since there are other packages that depend on tidyterra and they may be in risk of being archived on CRAN as well. So my question is: What is the best way forward at this point? Should I wait to get some feedback from CRAN or is it best to resubmit the package (I already have a new patch prepared)? I acknowledge that "The time of the volunteers is CRAN’s most precious resource" so my goal is to reduce their burden as much as possible. Kind regards -- Have a nice day! [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers
I have no idea how to get readxl::read_excel to import a timestamp column in a timezone. It is true that Excel has no concept of timezones, but the data one finds there usually came from a text file at some point. Importing as character is a feasible strategy, but trying to convince an intermediate user to go to that much trouble is a headache when the issue is ignored in the help file. It is evidently possible to specify a locale input to readr::read_csv, but the default behaviour guesses timestamp columns and assumes "UTC", and a file may contain data from different timezones (UTC and local civil are a common combination). Again, character import and manual conversion are needed. On October 10, 2022 9:40:42 AM PDT, Hadley Wickham wrote: >On Sun, Oct 9, 2022 at 9:31 PM Jeff Newmiller wrote: >> >> ... which is why tidyverse functions and Python datetime handling irk me so >> much. >> >> Is tidyverse time handling intrinsically broken? They have a standard >> practice of reading time as UTC and then using force_tz to fix the >> "mistake". Same as Python. > >Can you point to any docs that lead you to this conclusion so we can >get them fixed? I strongly encourage people to parse date-times in the >correct time zone; this is why lubridate::ymd_hms() and friends have a >tz argument. > >Hadley > -- Sent from my phone. Please excuse my brevity. __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers
> Ben Bolker > on Mon, 10 Oct 2022 16:59:35 -0400 writes: > Right now as.POSIXlt.Date() is just > function (x, ...) > .Internal(Date2POSIXlt(x)) It has been quite a bit different in R-devel for a little while. NEWS entries (there are more already, and more coming on the wide topic) * The as.POSIXlt() and as.POSIXct() default methods now do obey their tz argument, also in this case. * as.POSIXlt() now does apply a tz (timezone) argument, as does as.POSIXct(); partly suggested by Roland Fuss on the R-devel mailing list. and indeed it would have been good had you used (and read) the R-devel mailing list which is much more appropriate on the topic of *changing* base R behavior. > How expensive would it be to throw a warning when '...' is provided by > the user/discarded ?? > Alternately, perhaps the documentation could be amended, although I'm > not quite sure what to suggest. (The sentence Liam refers to, "Dates > without times are treated as being at midnight UTC." is correct but > terse ...) > On 2022-10-10 4:50 p.m., Alexandre Courtiol wrote: >> Hi Simon, >> >> Thanks for the clarification. >> >> From a naive developer point of view, we were initially baffled that the >> generic as.POSIXlt() does very different things on a character and on a >> Date input: >> >> as.POSIXlt(as.character(foo), "Europe/Berlin") >> [1] "1992-09-27 CEST" >> >> as.POSIXlt(foo, "Europe/Berlin") >> [1] "1992-09-27 UTC" >> >> Based on what you said, it does make sense: it is only when creating the >> date/time that we want to include the time zone and that only happens when >> we don't already work on a previously created date. >> That is your subtle but spot-on distinction between "parsing" and >> "changing" the time zone. >> >> Yet, we do find it dangerous that as.POSIXlt.Date() accepts a time zone but >> does nothing of it, especially when the help file starts with: >> >> Usage >> as.POSIXlt(x, tz = "", ...) >> >> The behaviour is documented, as Liam reported it, but still, we will almost >> certainly not be the last one tripping on this (without even adding the >> additional issue of as.POSIXct() behaving differently across OS). >> >> Thanks again, >> >> Alex & Liam >> >> >> >> >> On Mon, 10 Oct 2022 at 22:13, Simon Urbanek >> wrote: >> >>> Liam, >>> >>> I think I have failed to convey my main point in the last e-mail - which >>> was that you want to parse the date/time in the timezone that you care >>> about so in your example that would be >>> foo <- as.Date(33874, origin = "1899-12-30") foo >>> [1] "1992-09-27" as.POSIXlt(as.character(foo), "Europe/Berlin") >>> [1] "1992-09-27 CEST" >>> >>> I was explicitly saying that you do NOT want to simply change the time >>> zone on POSIXlt objects as that won't work for reasons I explained - see my >>> last e-mail. >>> >>> Cheers, >>> Simon >>> >>> On 11/10/2022, at 6:31 AM, Liam Bailey >>> wrote: Hi all, Thanks Simon for the detailed response, that helps us understand a lot >>> better what’s going on! However, with your response in mind, we still >>> encounter some behaviour that we did not expect. I’ve included another minimum reproducible example below to expand on >>> the situation. In this example, `foo` is a Date object that we generate >>> from a numeric input. Following your advice, `bar` is then a POSIXlt object >>> where we now explicitly define timezone using argument tz. However, even >>> though we are explicit about the timezone the POSIXlt that is generated is >>> always in UTC. This then leads to the issues outlined by Alexandre above, >>> which we now understand are caused by DST. ``` r #Generate date from numeric #Not possible to specify tz at this point foo <- as.Date(33874, origin = "1899-12-30") dput(foo) #> structure(8305, class = "Date") #Convert to POSIXlt specifying UTC timezone bar <- as.POSIXlt(foo, tz = "UTC") dput(bar) #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = >>> c("POSIXlt", #> "POSIXt"), tzone = "UTC") #Convert to POSIXlt specifying Europe/Berlin. #Time zone is still UTC bar <- as.POSIXlt(foo, tz = "Europe/Berlin") dput(bar) #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = >>> c("POSIXlt", #> "POSIX