Re: [Rd] transform.data.frame() ignores unnamed arguments when no named argument is provided
> Gabriel Becker > on Thu, 2 Mar 2023 14:37:18 -0800 writes: > On Thu, Mar 2, 2023 at 2:02 PM Antoine Fabri > wrote: >> Thanks and good point about unspecified behavior. The way >> it behaves now (when it doesn't ignore) is more >> consistent with data.frame() though so I prefer that to a >> "warn and ignore" behaviour: >> >> data.frame(a = 1, b = 2, 3) >> >> #> a b X3 >> >> #> 1 1 2 3 >> >> >> data.frame(a = 1, 2, 3) >> >> #> a X2 X3 >> >> #> 1 1 2 3 >> >> >> (and in general warnings make for unpleasant debugging so >> I prefer when we don't add new ones if avoidable) >> > I find silence to be much more unpleasant in practice when > debugging, myself, but that may be a personal preference. +1 I also *strongly* disagree with the claim " in general warnings make for unpleasant debugging " That may be true for beginners (for whom debugging is often not really feasible anyway ..), but somewhat experienced useRs should know about options(warn = 1) # or options(warn = 2) # plus options(error = recover) # or tryCatch( ..., warning = ..) or {even more} Martin -- Martin Maechler ETH Zurich and R Core team __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] confusing all.equal output
> peter dalgaard > on Thu, 2 Mar 2023 19:47:59 +0100 writes: > I believe the wording goes back to Martin Maechler many > moons ago (AFAICT towards the end of the last millennium.) > We might leave it to him to change it? > - Peter D. Thank you, Peter. Yes, this is *very* old. I could claim that R users seem to get more and more confused over time, because nobody had ever complained for a quarter of a century .. (;-) ;-) I know I had been inspired by the all.equal() implementation of S-PLUS version 3.x (x = 4, IIRC) at the time, but then I also think that I have to take the "full blame" on this : Trying to think like myself "yesterday, when I was young ..", I guess the argumentation for using is.NA was what I considered helpful to the non experienced S / R user at the time: Everybody has seen 'NA' before (and they see it in their objects in this case) but only somewhat more experienced useRs would know about is.na(). .. and it may be that at the time I found it "slick" to combine the "NA" and "is.na" into "is.NA" ... About the other wording and how the mismatches should be counted, I have no recollection. But indeed, already in 1999, i.e., before R 1.0.0 existed, that part of the code was out <- is.na(target) if(any(out != is.na(current))) return(paste("`is.NA' value mismatches:", sum(is.na(current)), "in current,", sum(out), " in target")) - - - Ok, now I need to work to commit a (completely orthogonal) change to all.equal.numeric() which had been lying around with me for about a year at least... so I can start looking at your proposed changes ... Martin >> On 2 Mar 2023, at 19:30 , avi.e.gr...@gmail.com wrote: >> >> I think if you step back, you can ask what the purpose of >> an error message is and who designs it. >> >> Is the message for the developer or others on their team >> or something an end-user knowing nothing about R will >> see. >> >> This reminds me a bit of legal mumbo jumbo that turns >> many reading it off as it keeps talking about the party >> of the first part or the plaintiff as compared to >> somewhat straighter talk. >> >> The scenario is that you are comparing two things. Their >> names are not things like "target" or "current" so even >> other programmers not involved in your code will pause >> and wonder. >> >> One view is to use phrases like first and second >> arguments/lists/whatever. You might talk about the one >> on the left (but using LHS is a bit opaque) versus the >> one on the right. >> >> But sometimes it can be too verbose. Sometimes the error >> message is being generated not where everything is clear. >> >> So ideally you could say: >> >> WARNING Danger Will Robinson. Comparing two things for >> equality. Result finds mismatches. There were NA found >> on the (left or right) that were not matched on the other >> side. Number of such found: 2 >> >> If you had a Systems Engineer write detailed requirements >> that included something a bit better than the example and >> the programmer was able to supply the data using the >> words and guidelines, it might fit some needs but maybe >> not satisfy other programmers. But there are human >> factors people whose job it is to help choose among >> alternatives and although they may not choose well, >> letting a programmer come up with whatever they feel like >> is generally worse. >> >> Yes, in their microcosm centered on a dozen lines of >> code, "current" and "target" may have meaning. But are >> they the intended user of the product? >> >> -Original Message- From: R-devel >> On Behalf Of Antoine >> Fabri Sent: Thursday, March 2, 2023 12:23 PM To: peter >> dalgaard Cc: R-devel >> Subject: Re: [Rd] confusing >> all.equal output >> >> Good points. I don't mind the terminology since target >> and current are the names of the arguments. As the >> function is already designed to stop at the first failing >> check we might not need to enumerate or count the >> mismatches, instead we could have "`NA` found in `target` >> but not in `current` at position " >> >> [[alternative HTML version deleted]] >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > -- > Peter Dalgaard, Professor, Center for Statistics, > Copenhagen Business School Solbjerg Plads 3, 2000 > Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 > Email: pd@cbs.dk Priv: pda...@gmail.com ___
Re: [Rd] transform.data.frame() ignores unnamed arguments when no named argument is provided
Let me expand a bit, I might have expressed myself poorly. If there is a good reason for a warning I want a warning, and because I take them seriously I don't want my console cluttered with those that can be avoided. I strongly believe we should strive to make our code silent, and I like my console to tell me only what I need to know. In my opinion many warnings would be better designed as errors, sometimes with an argument to opt in the behaviour, or a documented way to work around. Some other warnings should just be documented behavior, because the behavior is not all that surprising. Some reasons why I find warnings hard to debug: - options(warn = 1) is not always enough to spot the source of the warning - options(warn = 2) fails at every warning, including the ones that are not interesting to the user and that they may not do anything about, in these cases you'll have to find a way to shut off the first to get to the second, and if it's packaged code that's not fun. - Unlike with errors, traceback() won't help. - tryCatch() will help you only if you call it at the right place, assuming you've found it. - We might also have many harmless warnings triggered through loops and hiding important ones. - When you are sure that you are OK with your code despite the warning, say `as.numeric(c("1", "2", "foo"))`, a workaround might be expensive (here we could use regex first to ditch the non numeric strings but who does that) so you're tempted to use `suppressWarnings()`, but then you might be suppressing other important warnings so you just made your code less safe because the developper wanted to make it safer (you might say it's on the user but still, we get suboptimal code that was avoidable). Of course I might miss some approaches that would make my experience of debugging warnings more pleasant. In our precise case I don't find the behavior surprising enough to warrant more precious red ink since it's close to what we get with data.frame(), and close to what we get with dplyr::mutate() FWIW, so I'd be personally happier to have this documented and work silently. Either way I appreciate you considering the problem. Thanks, Antoine [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] transform.data.frame() ignores unnamed arguments when no named argument is provided
For what it's worth I think the increased emphasis on classed errors should help with this (i.e., it will be easier to filter out errors you know are false positives/irrelevant for your use case). On Fri, Mar 3, 2023 at 12:17 PM Antoine Fabri wrote: > > Let me expand a bit, I might have expressed myself poorly. > > If there is a good reason for a warning I want a warning, and because I > take them seriously I don't want my console cluttered with those that can > be avoided. I strongly believe we should strive to make our code silent, > and I like my console to tell me only what I need to know. In my opinion > many warnings would be better designed as errors, sometimes with an > argument to opt in the behaviour, or a documented way to work around. Some > other warnings should just be documented behavior, because the behavior is > not all that surprising. > > Some reasons why I find warnings hard to debug: > - options(warn = 1) is not always enough to spot the source of the warning > - options(warn = 2) fails at every warning, including the ones that are not > interesting to the user and that they may not do anything about, in these > cases you'll have to find a way to shut off the first to get to the second, > and if it's packaged code that's not fun. > - Unlike with errors, traceback() won't help. > - tryCatch() will help you only if you call it at the right place, assuming > you've found it. > - We might also have many harmless warnings triggered through loops and > hiding important ones. > - When you are sure that you are OK with your code despite the warning, say > `as.numeric(c("1", "2", "foo"))`, a workaround might be expensive (here we > could use regex first to ditch the non numeric strings but who does that) > so you're tempted to use `suppressWarnings()`, but then you might be > suppressing other important warnings so you just made your code less safe > because the developper wanted to make it safer (you might say it's on the > user but still, we get suboptimal code that was avoidable). > > Of course I might miss some approaches that would make my experience of > debugging warnings more pleasant. > > In our precise case I don't find the behavior surprising enough to warrant > more precious red ink since it's close to what we get with data.frame(), > and close to what we get with dplyr::mutate() FWIW, so I'd be personally > happier to have this documented and work silently. > > Either way I appreciate you considering the problem. > > Thanks, > > Antoine > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Augment base::replace(x, list, value) to allow list= to be a predicate?
Dear All, Currently, list= in base::replace(x, list, value) has to be an index vector. For me, at least, the most common use case is for list= to be some simple property of elements of x, e.g., x <- c(1,2,NA,3) replace(x, is.na(x), 0) Particularly when using R pipes, which don't allow multiple substitutions, it would simplify many of such cases if list= could be a function that returns an index, e.g., replace <- function (x, list, values, ...) { # Here, list() refers to the argument, not the built-in. if(is.function(list)) list <- list(x, ...) x[list] <- values x } Then, the following is possible: c(1,2,NA,3) |> replace(is.na, 0) Any thoughts? Pavel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] transform.data.frame() ignores unnamed arguments when no named argument is provided
I am probably mistaken but it looks to me like the design of much of the data.frame infrastructure not only does not insist you give columns names, but even has all kinds of options such as check.names and fix.empty.names https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/data.frame During the lifetime of a column, it can get removed, renamed, transfomed in many ways and so on. A data.frame read in from a file such as a .CSV often begins with temporary created names. It is so common, that sometimes not giving a name is a choice and not in any way an error. I have seen some rather odd names in backticks that include spaces and seen duplicate names. The reality is you can index by column number two and maybe no actual name was needed by the one creating or modifying the data. Some placed warnings are welcome as they tend to reflect a possibly serious error. But that error may not easily be at this point versus later in the game. If later the program tries to access the misnamed column, then an error makes sense. Warnings, if overused, get old quickly and you regularly see code written to suppress startup messages or warnings because the same message shown every day becomes something you ignore mentally even if not suppressed. How many times has loading the tidyverse reminded me it is shadowing a few base R functions? How many times have I really cared? What makes some sense to me is to add an argument to some functions BEGGING to be shown the errors of your ways and turn that on as you wish, often after something has gone wrong. -Original Message- From: R-devel On Behalf Of Martin Maechler Sent: Friday, March 3, 2023 10:26 AM To: Gabriel Becker Cc: Antoine Fabri ; R-devel Subject: Re: [Rd] transform.data.frame() ignores unnamed arguments when no named argument is provided > Gabriel Becker > on Thu, 2 Mar 2023 14:37:18 -0800 writes: > On Thu, Mar 2, 2023 at 2:02 PM Antoine Fabri > wrote: >> Thanks and good point about unspecified behavior. The way >> it behaves now (when it doesn't ignore) is more >> consistent with data.frame() though so I prefer that to a >> "warn and ignore" behaviour: >> >> data.frame(a = 1, b = 2, 3) >> >> #> a b X3 >> >> #> 1 1 2 3 >> >> >> data.frame(a = 1, 2, 3) >> >> #> a X2 X3 >> >> #> 1 1 2 3 >> >> >> (and in general warnings make for unpleasant debugging so >> I prefer when we don't add new ones if avoidable) >> > I find silence to be much more unpleasant in practice when > debugging, myself, but that may be a personal preference. +1 I also *strongly* disagree with the claim " in general warnings make for unpleasant debugging " That may be true for beginners (for whom debugging is often not really feasible anyway ..), but somewhat experienced useRs should know about options(warn = 1) # or options(warn = 2) # plus options(error = recover) # or tryCatch( ..., warning = ..) or {even more} Martin -- Martin Maechler ETH Zurich and R Core team __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel