[Rd] Should seq.Date() return double storage?
today <- Sys.Date() typeof(today) # [1] "double" typeof(seq(today, by=1, length.out=2)) # [1] "integer" Clearly minor as it doesn't seem to have come up before (e.g. coercion to numeric will happen automatically whenever fractional dates are needed); I only noticed because of a test using identical failing: identical( seq(today, by=1, length.out=10), today + 0:9 ) # [1] FALSE It's easy in this case to fix the test using coercion, but this could (and did in practice) come up at deeper levels of nesting that become more onerous to handle. And using all.equal() comes with its own tradeoffs. The fix would be easy enough -- at a glance there are two usages of .Date(seq.int(...)) in seq.Date() that could be replaced by .Date(as.numeric(seq.int(...))). Mike C __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] .onLoad, packageStartupMessage, and R CMD check
I wrote a linter to stop users from using packageStartupMessage() in their .onLoad() hook because of the R CMD check warning it triggers: https://github.com/wch/r-source/blob/8b6625e39cd62424dc23399dade37f20fa8afa91/src/library/tools/R/QC.R#L5167 However, this received some pushback which I ultimately agree with, and moreover ?.onLoad seems to agree as well: > Loading a namespace should where possible be silent, with startup messages given by \code{.onAttach}. These messages (**and any essential ones from \code{.onLoad}**) should use \code{\link{packageStartupMessage}} so they can be silenced where they would be a distraction. **emphasis** mine. That is, if we think some message is _essential_ to print during loadNamespace(), we are told to use packageStartupMessage(). Should we remove this R CMD check warning? __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] .onLoad, packageStartupMessage, and R CMD check
Examining more closely, it's a NOTE produced by R CMD check -- originally I had thought it was a WARNING, which I think would have been too strong for this case. A NOTE actually seems fine, on second thought. For a tiny bit of context, it's common for us to issue messaging around some state initialization, which has to happen after some (ex-ante unknown) set of packages are loaded. It's important to do so whether or not the package is attached, so the proviso in .onLoad() indeed makes the most sense. Thanks! On Thu, Nov 4, 2021 at 1:02 PM Gabriel Becker wrote: > > Hi Michael, > > Indeed, just to elaborate further on what I believe Duncan's point is, can > you give any examples, "dire" or not, that are appropriate when the package > is loaded but not attached (ie none of its symbols are visible to the user > without using :::)? > > The only things I can think of are a package that changes the behavior of > other, attached package code, such as conflicted. Doing so is very much an > anti-pattern imo generally, with something like conflicted being an > (arguable) exception. And that's assuming conflicted even works/does anything > when loaded but not attached (I have not confirmed whether thats the case or > not). That or a package that is at end-of-life and is or soon will be > unsupported entirely. > > The examples don't need to be yours, per se, if you know what those pushing > back against your linter were using messages from .onLoad for... > > Best, > ~G > > > > On Thu, Nov 4, 2021 at 12:37 PM Duncan Murdoch > wrote: >> >> On 04/11/2021 2:50 p.m., Michael Chirico via R-devel wrote: >> > I wrote a linter to stop users from using packageStartupMessage() in >> > their .onLoad() hook because of the R CMD check warning it triggers: >> > >> > https://github.com/wch/r-source/blob/8b6625e39cd62424dc23399dade37f20fa8afa91/src/library/tools/R/QC.R#L5167 >> > >> > However, this received some pushback which I ultimately agree with, >> > and moreover ?.onLoad seems to agree as well: >> > >> >> Loading a namespace should where possible be silent, with startup >> > messages given by \code{.onAttach}. These messages (**and any essential >> > ones from \code{.onLoad}**) should use \code{\link{packageStartupMessage}} >> > so they can be silenced where they would be a distraction. >> > >> > **emphasis** mine. That is, if we think some message is _essential_ to >> > print during loadNamespace(), we are told to use >> > packageStartupMessage(). >> > >> > Should we remove this R CMD check warning? >> >> The help page doesn't define what an "essential" message would be, but I >> would assume it's a message about some dire condition, not just "Hi! I >> just got loaded!". So I think a note or warning would be appropriate, >> but not an error. >> >> Do you have an example of something that should routinely print, but >> that triggers a warning when checked? >> >> Duncan Murdoch >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Shouldn't "Loading" be "Attaching" when referring to library() calls?
require() and library() both emit this message immediately before running library(): "Loading required package: %s" https://github.com/r-devel/r-svn/blob/4198a2941b702d965bb2374c2b908f48c369f40a/src/library/base/R/library.R#L967-L968 https://github.com/r-devel/r-svn/blob/4198a2941b702d965bb2374c2b908f48c369f40a/src/library/base/R/library.R#L684-L685 Shouldn't "Loading" be "Attaching" instead? My understanding is "a package is loaded" is equivalent to "isNamespaceLoaded()", i.e., loadNamespace() was run. And that "a package is attached" is equivalent to "pkg %in% .packages()", i.e., library(pkg) was run. Is the terminology not so precise? There's certainly ambiguity in the internal variable names referenced above -- in require, we see loaded <- paste0("package:", package) %in% search() https://github.com/r-devel/r-svn/blob/4198a2941b702d965bb2374c2b908f48c369f40a/src/library/base/R/library.R#L680 Whereas in library() [via .getRequiredPackages2()] we see attached <- paste0("package:", pkg) %in% search() https://github.com/r-devel/r-svn/blob/4198a2941b702d965bb2374c2b908f48c369f40a/src/library/base/R/library.R#L931 Mike C __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Line-terminal \ in character consants -- missing from ?Quotes ?
I'm coming across some code that uses the fact the parser ignores a line-terminal '\', e.g. identical("\ ", "\n") # [1] TRUE x = "abc \ def" y = "abc \ndef" identical(x, y) # [1] TRUE However: identical("\\n", "\n") # [1] FALSE This appears to be undocumented behavior; the closest thing I see in ?Quotes suggests it should be an error: > Escaping a character not in the following table is an error. ('\n' is in the table, but my understanding is the 'n' is what's being escaped v-a-v the "error", which seems confirmed by the third, FALSE, example above) Is this a bug, is the omission from ?Quotes a bug, or is this just undocumented behavior? Mike C __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Line-terminal \ in character consants -- missing from ?Quotes ?
I'm still hung up on ?Quotes -- I can't see mention of 'newline' as a valid escape. It mentions the literal sequence '\' 'n', where 'n' is being escaped. Glanced at the parser blame and apparently the terminal '\' is the older behavior, and what I'm used to, i.e. literal newlines in char constants to make multi-line strings, is new (though still 20 years old): https://github.com/r-devel/r-svn/commit/bc3f20e4e686be556877bb6bd2882ae8029fd17f The NEWS entry there does say the same thing as you -- "escaping the newlines with backslashes". >From the parser, I think ?Quotes is just missing "newline" as being a valid escaped character, c.f. https://github.com/r-devel/r-svn/blob/f55b24945d56e824f124638c596b99887441354a/src/main/gram.y#L2823-L2830 ('\n' is treated like '\') https://github.com/r-devel/r-svn/blob/f55b24945d56e824f124638c596b99887441354a/src/main/gram.y#L2978-L3008 ('\n' is in the list of valid items after '\') I don't see any special handling for '\r', so there may be a gap in the R parser? Or I just don't understand what I'm reading in the parser :) Mike C On Sun, Feb 12, 2023 at 3:38 AM Duncan Murdoch wrote: > > On 12/02/2023 12:07 a.m., Michael Chirico via R-devel wrote: > > I'm coming across some code that uses the fact the parser ignores a > > line-terminal '\', e.g. > > > > identical("\ > > ", "\n") > > # [1] TRUE > > > > x = "abc \ > > def" > > y = "abc \ndef" > > identical(x, y) > > # [1] TRUE > > > > However: > > identical("\\n", "\n") > > # [1] FALSE > > > > This appears to be undocumented behavior; the closest thing I see in > > ?Quotes suggests it should be an error: > > > >> Escaping a character not in the following table is an error. > > > > ('\n' is in the table, but my understanding is the 'n' is what's being > > escaped v-a-v the "error", which seems confirmed by the third, FALSE, > > example above) > > > > Is this a bug, is the omission from ?Quotes a bug, or is this just > > undocumented behavior? > > In your first example, you have a backslash which says to escape the > next char. The next char is a newline char. The result is an escaped > newline, which apparently is a newline. > > The same thing happens in the second example. > > The third example is an escaped backslash, i.e. a backslash, followed by > n. That's not the same as an escaped n, which gives a newline. > > So I think the behaviour might be reasonable. > > The thing I'd worry about is whether things are handled properly on > Windows, where the newline is two characters (CR LF). It might be that > the backslash at the end of the line escapes the CR, and you get a \r > out of it instead of a \n. But maybe not, the parser knows about CR LF > and internally converts it to \n, so if that happens early enough, > things would be fine. > > Duncan Murdoch > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] FR: names= argument for load()
I have three use cases in mind for an argument to load specifying which variables to load from the input 'file': 1. Avoid overwriting existing variables. ?load recommends using envir= or attach() to avoid overwrites. An argument like names= would allow even finer control to avoid collisions. 2. Avoid loading too many (in quantity or in memory size) variables from a large file. We might save dozens or hundreds of models to the same file and then load them more lazily. 3. Helping static analysis. Currently, when load() is used in a script, it becomes impossible to distinguish truly undefined variables from those defined implicitly by load(). With a names= argument, it can be possible to know statically precisely which names were introduced by load, and which might be a bug. (of course there's nothing stopping authors from having non-literal entries to names=, but that's a more general issue of static analysis). Michael Chirico __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] tab-complete for non-syntactic names could attempt backtick-wrapping
Consider: x <- list(`a b` = 1) x$a (i.e., press the 'tab' key after typing 'x$a') The auto-complete mechanism will fill the buffer like so: x$a b This is not particularly helpful because this is now a syntax error. It seems to me there's a simple fix -- in utils:::specialCompletions(), we can wrap the result of utils:::specialOpCompletionsHelper() with backticks for non-syntactic names ([1]): comps <- specialOpCompletionsHelper(op, suffix, prefix) if (length(comps) == 0L) comps <- "" +non_syntactic <- make.names(comps) != comps +comps[non_syntactic] <- paste0("`", comps[non_syntactic], "`") sprintf("%s%s%s", prefix, op, comps) I'm somewhat surprised this hasn't come up before (I searched for 'completeToken', 'specialCompletions', and 'specialOpCompletionsHelper' here and on Bugzilla), so I'm checking with the list first if I'm missing anything before filing a patch. Mike C [1] https://github.com/r-devel/r-svn/blob/4657f65a377cb5ef318c6548bc264e3b0f9517a0/src/library/utils/R/completion.R#L536-L538 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] tab-complete for non-syntactic names could attempt backtick-wrapping
Great suggestion! I've started a patch: https://bugs.r-project.org/show_bug.cgi?id=18479 On Wed, Mar 1, 2023 at 1:56 AM Ivan Krylov wrote: > > В Wed, 1 Mar 2023 01:36:02 -0800 > Michael Chirico via R-devel пишет: > > > +comps[non_syntactic] <- paste0("`", comps[non_syntactic], "`") > > There are a few more corner cases. For example, comps could contain > backticks (which should be escaped with backslashes) and backslashes > (which should also be escaped). Thankfully, \u-style Unicode escape > sequences are not currently supported inside backticks, and "escape the > backslash" rule already takes care of them. > > The deparse() function already knows these rules: > > name <- 'hello world ` \\uFF' > cat(deparse1(as.name(name), backtick=TRUE), '\n') > # `hello world \` \\uFF` > `hello world \` \\uFF` <- 'hello' > `hello world \` \\uFF` > # [1] "hello" > > -- > Best regards, > Ivan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] tab-complete for non-syntactic names could attempt backtick-wrapping
I personally wouldn't like using a string, and this comment makes me think it's against the r-core preference as well: https://bugs.r-project.org/show_bug.cgi?id=18429#c1 Thanks both for catching the sloppy mistake in vapply() :) Let's continue discussion on the bug/PR. On Thu, Mar 2, 2023 at 12:39 AM Ivan Krylov wrote: > > There turn out to be a few more things to fix. > > One problem is easy to solve: vapply() needs a third argument > specifying the type of the return value. (Can we have unit tests for > tab completion?) > > The other problem is harder: `comps` defaults to an empty string, and > you can't have a symbol consisting of an empty string, because this > value is internally reserved for missing function arguments. I think > you can return(paste0(prefix, op)) if length(comps) == 0L, but this is > still somewhat worrying. R tries to prevent empty names, so I wouldn't > expect specialOpCompletionsHelper() to return an empty string, but I > can't prove it right now. > > On the other hand, x$'a string' is the same as x$`a string`. Could we > just drop as.name() and keep the return value being a string literal? > I'm not sure about this, either. > > -- > Best regards, > Ivan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Improving user-friendliness of S4 dispatch failure when mis-naming arguments?
I fielded a debugging request from a non-expert user today. At root was running the following: dbGetQuery(connection = conn, query = query) The problem is that they've named the arguments incorrectly -- it should have been [1]: dbGetQuery(conn = conn, statement = query) The problem is that the error message "looks" highly confusing to the untrained eye: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘dbGetQuery’ for signature ‘"missing", "missing"’ In retrospect, of course, this makes sense -- the mis-named arguments are getting picked up by '...', leaving the required arguments missing. But I was left wondering how we could help users right their own ship here. Would it help to mention the argument names? To include some code checking for weird combinations of missing arguments? Any other suggestions? Mike C [1] https://github.com/r-dbi/DBI/blob/97934c885749dd87a6beb10e8ccb6a5ebea3675e/R/dbGetQuery.R#L62-L64 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving user-friendliness of S4 dispatch failure when mis-naming arguments?
I forwarded that along to the original reporter with positive feedback -- including the argument names is definitely a big help for cuing what exactly is missing. Would a patch to do something similar for S4 be useful? On Thu, Aug 10, 2023 at 6:46 AM Hadley Wickham wrote: > > Hi Michael, > > I can't help with S4, but I can help to make sure this isn't a problem > with S7. What do you think of the current error message? Do you see > anything obvious we could do to improve? > > library(S7) > > dbGetQuery <- new_generic("dbGetQuery", c("conn", "statement")) > dbGetQuery(connection = NULL, query = NULL) > #> Error: Can't find method for generic `dbGetQuery(conn, statement)`: > #> - conn : MISSING > #> - statement: MISSING > > Hadley > > On Wed, Aug 9, 2023 at 10:02 PM Michael Chirico via R-devel > wrote: > > > > I fielded a debugging request from a non-expert user today. At root > > was running the following: > > > > dbGetQuery(connection = conn, query = query) > > > > The problem is that they've named the arguments incorrectly -- it > > should have been [1]: > > > > dbGetQuery(conn = conn, statement = query) > > > > The problem is that the error message "looks" highly confusing to the > > untrained eye: > > > > Error in (function (classes, fdef, mtable) : unable to find an > > inherited method for function ‘dbGetQuery’ for signature ‘"missing", > > "missing"’ > > > > In retrospect, of course, this makes sense -- the mis-named arguments > > are getting picked up by '...', leaving the required arguments > > missing. > > > > But I was left wondering how we could help users right their own ship here. > > > > Would it help to mention the argument names? To include some code > > checking for weird combinations of missing arguments? Any other > > suggestions? > > > > Mike C > > > > [1] > > https://github.com/r-dbi/DBI/blob/97934c885749dd87a6beb10e8ccb6a5ebea3675e/R/dbGetQuery.R#L62-L64 > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving user-friendliness of S4 dispatch failure when mis-naming arguments?
Here's a trivial patch that offers some improvement: Index: src/library/methods/R/methodsTable.R === --- src/library/methods/R/methodsTable.R (revision 84931) +++ src/library/methods/R/methodsTable.R (working copy) @@ -752,11 +752,12 @@ if(length(methods) == 1L) return(methods[[1L]]) # the method else if(length(methods) == 0L) { -cnames <- paste0("\"", vapply(classes, as.character, ""), "\"", +cnames <- paste0(head(fdef@signature, length(classes)), "=\"", vapply(classes, as.character, ""), "\"", collapse = ", ") stop(gettextf("unable to find an inherited method for function %s for signature %s", sQuote(fdef@generic), sQuote(cnames)), + call. = FALSE, domain = NA) } else Here's the upshot for the example on DBI: dbGetQuery(connection = conn, query = query) Error: unable to find an inherited method for function ‘dbGetQuery’ for signature ‘conn="missing", statement="missing"’ I don't have any confidence about edge cases / robustness of this patch for generic S4 use cases (make check-all seems fine), but I don't suppose a full patch would be dramatically different from the above. Mike C On Thu, Aug 10, 2023 at 12:39 PM Gabriel Becker wrote: > > I just want to add my 2 cents that I think it would be very useful and > beneficial to improve S4 to surface that information as well. > > More information about the way that the dispatch failed would be of great > help in situations like the one Michael pointed out. > > ~G > > On Thu, Aug 10, 2023 at 9:59 AM Michael Chirico via R-devel > wrote: >> >> I forwarded that along to the original reporter with positive feedback >> -- including the argument names is definitely a big help for cuing >> what exactly is missing. >> >> Would a patch to do something similar for S4 be useful? >> >> On Thu, Aug 10, 2023 at 6:46 AM Hadley Wickham wrote: >> > >> > Hi Michael, >> > >> > I can't help with S4, but I can help to make sure this isn't a problem >> > with S7. What do you think of the current error message? Do you see >> > anything obvious we could do to improve? >> > >> > library(S7) >> > >> > dbGetQuery <- new_generic("dbGetQuery", c("conn", "statement")) >> > dbGetQuery(connection = NULL, query = NULL) >> > #> Error: Can't find method for generic `dbGetQuery(conn, statement)`: >> > #> - conn : MISSING >> > #> - statement: MISSING >> > >> > Hadley >> > >> > On Wed, Aug 9, 2023 at 10:02 PM Michael Chirico via R-devel >> > wrote: >> > > >> > > I fielded a debugging request from a non-expert user today. At root >> > > was running the following: >> > > >> > > dbGetQuery(connection = conn, query = query) >> > > >> > > The problem is that they've named the arguments incorrectly -- it >> > > should have been [1]: >> > > >> > > dbGetQuery(conn = conn, statement = query) >> > > >> > > The problem is that the error message "looks" highly confusing to the >> > > untrained eye: >> > > >> > > Error in (function (classes, fdef, mtable) : unable to find an >> > > inherited method for function ‘dbGetQuery’ for signature ‘"missing", >> > > "missing"’ >> > > >> > > In retrospect, of course, this makes sense -- the mis-named arguments >> > > are getting picked up by '...', leaving the required arguments >> > > missing. >> > > >> > > But I was left wondering how we could help users right their own ship >> > > here. >> > > >> > > Would it help to mention the argument names? To include some code >> > > checking for weird combinations of missing arguments? Any other >> > > suggestions? >> > > >> > > Mike C >> > > >> > > [1] >> > > https://github.com/r-dbi/DBI/blob/97934c885749dd87a6beb10e8ccb6a5ebea3675e/R/dbGetQuery.R#L62-L64 >> > > >> > > __ >> > > R-devel@r-project.org mailing list >> > > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> > >> > >> > -- >> > http://hadley.nz >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving user-friendliness of S4 dispatch failure when mis-naming arguments?
> I'm not entirely sure about the extra call. = FALSE My thinking is the signature of the internal .InheritForDispatch() is not likely to help anyone, in fact having the opposite effect for beginners unsure how to use that info. > Now I'd like to find an example that only uses packages with priority base | > Recommended Sure, here are a few. library(Matrix) # searching for Matrix-owned generics matrix_generics <- getGenerics(where = asNamespace("Matrix")) matrix_generics@.Data[matrix_generics@package == "Matrix"] # simple signature, one argument 'x' symmpart() # Error: unable to find an inherited method for function ‘symmpart’ for signature ‘x="missing"’ # more complicated signature, especially including ... Cholesky(a = 1) # Error: unable to find an inherited method for function ‘Cholesky’ for signature ‘A="missing"’ Cholesky(a = 1, perm = TRUE) # Error: unable to find an inherited method for function ‘Cholesky’ for signature ‘A="missing"’ Cholesky(a = 1, perm = TRUE, IMult = 2) # Error: unable to find an inherited method for function ‘Cholesky’ for signature ‘A="missing"’ --- 'base' is a bit harder since stats4 just provides classes over stats functions, so the missigness error comes from non-S4 code. library(stats4) coef() # Error in coef.default() : argument "object" is missing, with no default Defining our own generic: setGeneric("BaseGeneric", \(a, ...) standardGeneric("BaseGeneric")) BaseGeneric() # Error: unable to find an inherited method for function ‘BaseGeneric’ for signature ‘a="missing"’ # getting multiple classes to show up requires setting the signature: setMethod("BaseGeneric", signature(x = "double", y = "double"), \(x, y, ...) x + y) BaseGeneric(X = 1, Y = 2) # Error: unable to find an inherited method for function ‘BaseGeneric’ for signature ‘x="missing", y="missing"’ On Fri, Aug 11, 2023 at 2:26 AM Martin Maechler wrote: > > >>>>> Michael Chirico via R-devel > >>>>> on Thu, 10 Aug 2023 23:56:45 -0700 writes: > > > Here's a trivial patch that offers some improvement: > > > Index: src/library/methods/R/methodsTable.R > > === > > --- src/library/methods/R/methodsTable.R (revision 84931) > > +++ src/library/methods/R/methodsTable.R (working copy) > > @@ -752,11 +752,12 @@ > > if(length(methods) == 1L) > > return(methods[[1L]]) # the method > > else if(length(methods) == 0L) { > > -cnames <- paste0("\"", vapply(classes, as.character, ""), "\"", > > +cnames <- paste0(head(fdef@signature, length(classes)), "=\"", > > vapply(classes, as.character, ""), "\"", > > collapse = ", ") > > stop(gettextf("unable to find an inherited method for function %s > > for signature %s", > > sQuote(fdef@generic), > > sQuote(cnames)), > > + call. = FALSE, > > domain = NA) > > } > > else > > > Here's the upshot for the example on DBI: > > > dbGetQuery(connection = conn, query = query) > > Error: unable to find an inherited method for function ‘dbGetQuery’ > > for signature ‘conn="missing", statement="missing"’ > > > I don't have any confidence about edge cases / robustness of this > > patch for generic S4 use cases (make check-all seems fine), > > Good you checked, but you are right that that's not all enough to be sure: > > Checking error *messages* is not something we do often {not > the least because you'd need to consider message translations > and hence ensure you only check in case of English ...}. > > > but I don't suppose a full patch would be dramatically different from > the > > above. > > I agree: The patch looks to make sense to me, too, > while I'm not entirely sure about the extra call. = FALSE > (which I of course understand you'd prefer for the above example) > > Now I'd like to find an example that only uses packages with priority > base | Recommended > > Martin > > -- > Martin Maechler > ETH Zurich and R Core team > > > > Mike C > > > On Thu, Aug 10, 2023 at 12:39 PM Gabriel Becker > wrote: > >> > >> I just want to add my 2 cents that I think it would be very useful and > beneficial to improve S4 to surface that information as well. > >> > >> Mo
[Rd] Unclear provenance of message from S4
MRE to produce the message is the following: setClass("Foo") setOldClass("Bar") setAs("Bar", "Foo", \(x) x) # NOTE: arguments in definition for coerce changed from (x) to (from) In an interactive setting, that may be fine, but I first encountered this message in the install log of a package for which I am not the author (nor do I have any context on the package, really). As such it took me much longer than it otherwise would have to pick up on the connection between 'coerce' and 'setAs' (as searching the sources for 'coerce' yielded nothing). That the faulty argument is the ubiquitous 'x' is also a major impediment. It would be much better to surface the relationship to 'setAs' in this message. I believe that logic has to live in setAs itself as the delegated substituteFunctionArgs loses the proper context (unless we dare to resort to sys.calls()). Sharing here first per usual as I am not very familiar with S4, in case this all is much clearer to someone with a sharper eye for S4. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] FR: valid_regex() to test string validity as a regular expression
It will be useful to package authors trying to validate input which is supposed to be a valid regular expression. As near as I can tell, the only way we can do so now is to run any regex function and check for the warning and/or condition to bubble up: valid_regex <- function(str) { stopifnot(is.character(str), length(str) == 1L) !inherits(tryCatch(grepl(str, ""), condition = identity), "condition") } That's pretty hefty/inscrutable for such a simple validation. I see a variety of similar approaches in CRAN packages [1], all slightly different. It would be good for R to expose a "canonical" way to run this validation. At root, the problem is that R does not expose the regex compilation routines like 'tre_regcomp', so from the R side we have to resort to hacky approaches. Things get slightly complicated by encoding/useBytes modes (tre_regwcomp, tre_regncomp, tre_regwncomp, tre_regcompb, tre_regncompb; all in tre.h), but all are already present in other regex routines, so this is doable. Exposing a function to compile regular expressions is common in other languages, e.g. Go [2], Python [3], JavaScript [4]. [1] https://github.com/search?q=lang%3AR+%2Fis%5Ba-zA-Z0-9._%5D*reg%5Ba-zA-Z0-9._%5D*ex.*%28%3C-%7C%3D%29%5Cs*function%2F+org%3Acran&type=code [2] https://pkg.go.dev/regexp#Compile [3] https://docs.python.org/3/library/re.html#re.compile [4] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] FR: valid_regex() to test string validity as a regular expression
> Grepping an empty string might work in many cases... That's precisely why a base R offering is important, as a surer way of validating in all cases. To be clear I am trying to directly access the results of tre_regcomp(). > it is probably more portable to simply be prepared to propagate such errors from the actual use on real inputs That works best in self-contained calls -- foo(re) and we execute re inside foo(). But the specific context where I found myself looking for a regex validator is more complicated (https://github.com/r-lib/lintr/pull/2225). User supplies a regular expression in a configuration file, only "later" is it actually supplied to grepl(). Till now, we've done your suggestion -- just surface the regex error at run time. But our goal is to make it friendlier and fail earlier at "compile time" as the config is loaded, "long" before any regex is actually executed. At a bare minimum this is a good place to return a classed warning (say invalid_regex_warning) to allow finer control than tryCatch(condition=). On Mon, Oct 9, 2023, 11:30 PM Tomas Kalibera wrote: > > On 10/10/23 01:57, Michael Chirico via R-devel wrote: > > It will be useful to package authors trying to validate input which is > supposed to be a valid regular expression. > > As near as I can tell, the only way we can do so now is to run any > regex function and check for the warning and/or condition to bubble > up: > > valid_regex <- function(str) { > stopifnot(is.character(str), length(str) == 1L) > !inherits(tryCatch(grepl(str, ""), condition = identity), "condition") > } > > That's pretty hefty/inscrutable for such a simple validation. I see a > variety of similar approaches in CRAN packages [1], all slightly > different. It would be good for R to expose a "canonical" way to run > this validation. > > At root, the problem is that R does not expose the regex compilation > routines like 'tre_regcomp', so from the R side we have to resort to > hacky approaches. > > Hi Michael, > > I don't think you need compilation functions for that. If a regular > expression is found invalid by a specific third party library R uses, the > library should return and error to R and R should return an error to you, > and you should probably propagate that to your users. Grepping an empty > string might work in many cases as a test, but it is probably more portable > to simply be prepared to propagate such errors from the actual use on real > inputs. In theory, there could be some optimization for a particular case, > the checking may not be the same - but that is the same say for compilation > and checking. > > Things get slightly complicated by encoding/useBytes modes > (tre_regwcomp, tre_regncomp, tre_regwncomp, tre_regcompb, > tre_regncompb; all in tre.h), but all are already present in other > regex routines, so this is doable. > > Re encodings, simply R strings should be valid in their encoding. This is > not just for regular expressions but also for anything else. You shouldn't > assume that R can handle invalid strings in any reasonable way. Definitely > you shouldn't try adding invalid strings in tests - behavior with invalid > strings is unspecified. To test whether a string is valid, there is > validEnc() (or validUTF8()). But, again, it is probably safest to propagate > errors from the regular expression R functions (in case the checks differ, > particularly for non-UTF-8), also, duplicating the encoding checks can be a > non-trivial overhead. > > If there was a strong need to have an automated way to somehow classify > specifically errors from the regex libraries, perhaps R could attach some > classes to them when the library tells. > > Tomas > > Exposing a function to compile regular expressions is common in other > languages, e.g. Go [2], Python [3], JavaScript [4]. > > [1] > https://github.com/search?q=lang%3AR+%2Fis%5Ba-zA-Z0-9._%5D*reg%5Ba-zA-Z0-9._%5D*ex.*%28%3C-%7C%3D%29%5Cs*function%2F+org%3Acran&type=code > [2] https://pkg.go.dev/regexp#Compile > [3] https://docs.python.org/3/library/re.html#re.compile > [4] > https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp > > __r-de...@r-project.org mailing > listhttps://stat.ethz.ch/mailman/listinfo/r-devel > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] FR: valid_regex() to test string validity as a regular expression
Great to know this exists in package space! Of course, using re2 validation for a regex to be executed with TRE (via grep*) is just begging for trouble (e.g. [1] suggests re2 is closer to PCRE, [2] says "mostly" PCRE compatible). And overhauling everything to use re2 just for regex validation is hardly practical. [1] https://laurikari.net/tre/google-releases-the-re2-library/ [2] https://hackerboss.com/is-your-regex-matcher-up-to-snuff/ On Wed, Oct 11, 2023 at 4:02 PM Toby Hocking wrote: > > Hi Michael, it sounds like you don't want to use a CRAN package for > this, but you may try re2, see below. > > > grepl("(invalid","subject",perl=TRUE) > Error in grepl("(invalid", "subject", perl = TRUE) : > invalid regular expression '(invalid' > In addition: Warning message: > In grepl("(invalid", "subject", perl = TRUE) : > PCRE pattern compilation error > 'missing closing parenthesis' > at '' > > > grepl("(invalid","subject",perl=FALSE) > Error in grepl("(invalid", "subject", perl = FALSE) : > invalid regular expression '(invalid', reason 'Missing ')'' > In addition: Warning message: > In grepl("(invalid", "subject", perl = FALSE) : > TRE pattern compilation error 'Missing ')'' > > > re2::re2_regexp("(invalid") > Error: missing ): (invalid > > On Tue, Oct 10, 2023 at 7:57 AM Michael Chirico via R-devel > wrote: > > > > > Grepping an empty string might work in many cases... > > > > That's precisely why a base R offering is important, as a surer way of > > validating in all cases. To be clear I am trying to directly access the > > results of tre_regcomp(). > > > > > it is probably more portable to simply be prepared to propagate such > > errors from the actual use on real inputs > > > > That works best in self-contained calls -- foo(re) and we execute re inside > > foo(). > > > > But the specific context where I found myself looking for a regex validator > > is more complicated (https://github.com/r-lib/lintr/pull/2225). User > > supplies a regular expression in a configuration file, only "later" is it > > actually supplied to grepl(). > > > > Till now, we've done your suggestion -- just surface the regex error at run > > time. But our goal is to make it friendlier and fail earlier at "compile > > time" as the config is loaded, "long" before any regex is actually executed. > > > > At a bare minimum this is a good place to return a classed warning (say > > invalid_regex_warning) to allow finer control than tryCatch(condition=). > > > > On Mon, Oct 9, 2023, 11:30 PM Tomas Kalibera > > wrote: > > > > > > > > On 10/10/23 01:57, Michael Chirico via R-devel wrote: > > > > > > It will be useful to package authors trying to validate input which is > > > supposed to be a valid regular expression. > > > > > > As near as I can tell, the only way we can do so now is to run any > > > regex function and check for the warning and/or condition to bubble > > > up: > > > > > > valid_regex <- function(str) { > > > stopifnot(is.character(str), length(str) == 1L) > > > !inherits(tryCatch(grepl(str, ""), condition = identity), "condition") > > > } > > > > > > That's pretty hefty/inscrutable for such a simple validation. I see a > > > variety of similar approaches in CRAN packages [1], all slightly > > > different. It would be good for R to expose a "canonical" way to run > > > this validation. > > > > > > At root, the problem is that R does not expose the regex compilation > > > routines like 'tre_regcomp', so from the R side we have to resort to > > > hacky approaches. > > > > > > Hi Michael, > > > > > > I don't think you need compilation functions for that. If a regular > > > expression is found invalid by a specific third party library R uses, the > > > library should return and error to R and R should return an error to you, > > > and you should probably propagate that to your users. Grepping an empty > > > string might work in many cases as a test, but it is probably more > > > portable > > > to simply be prepared to propagate such errors from the actual use on real > > > inputs. In theory, there could be some optimization for a parti