Re: [Rd] quantile() names
> Gabriel Becker > on Mon, 14 Dec 2020 13:23:00 -0800 writes: > Hi Edgar, I certainly don't think quantile(x, .975) should > return 980, as that is a completely wrong answer. > I do agree that it seems like the name is a bit > offputting. I'm not sure how deep in the machinery you'd > have to go to get digits to no effect on the names (I > don't have time to dig in right this second). > On the other hand, though, if we're going to make the > names not respect digits entirely, what do we do when > someone does quantile(x, 1/3)? That'd be a bad time had by > all without digits coming to the rescue, i think. > Best, ~G and now we read more replies on this topic without anyone looking at the pure R source code which is pretty simple and easy. Instead, people do experiments and take time to muse about their findings.. Honestly, I'm disappointed: I've always thought that if you *write* on R-devel, you should be able to figure out a few things yourself before that.. It's not rocket science to see/know that you need to quickly look at the quantile.default() method function and then to note that it's format_perc(.) which is used to create the names. Almost surely, I've been a bit envolved in creating parts of this and probably am responsible for the current default behavior. (sounds of digging) ... --> Yes: r837 | maechler | 1998-03-05 12:20:37 +0100 (Thu, 05. Mar 1998) | 2 Zeilen GeƤnderte Pfade: M /trunk/src/library/base/R/quantile M /trunk/src/library/base/man/quantile.Rd fixed names(.) construction With this diff (my 'svn-diffB -c837 quantile') : Index: quantile === 21c21,23 < names(qs) <- paste(round(100 * probs), "%", sep = "") --- > names(qs) <- paste(formatC(100 * probs, format= "fg", wid=1, > dig= max(2,.Options$digits)), > "%", sep = "") - so this was before this was modularized into the format_perc() utility and quite a while before R 1.0.0 Now, 22.8 years later, I do think that indeed it was not necessarily the best idea to make the names() construction depend on the 'digits' option entirely and just protect it by using at least 2 digits. What I think is better is to 1) provide an optional argument 'digits = 7' back compatible w/ default getOption("digits") 2) when used, check that it is at least '1' But then some scripts / examples of some people *will* change ..., e.g., because they preferred to have a global setting of digits=5 so I'm guessing it may make more people unhappy than other people happy if we change this now, after close to 23 years .. ?? Martin -- Martin Maechler ETH Zurich and R Core team > On Mon, Dec 14, 2020 at 11:55 AM Merkle, Edgar > C. wrote: >> All, >> >> Consider the code below >> >> options(digits=2) >> x <- 1:1000 >> quantile(x, .975) >> The value returned is 975 (the 97.5th percentile), but >> the name has been shortened to "98%" due to the digits >> option. Is this intended? I would have expected the name >> to also be "97.5%" here. Alternatively, the returned >> value might be 980 in order to match the name of "98%". >> >> Best, Ed >> __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] power.prop.test() documentation question
Hi All, Based upon a discussion on power/sample size calculations on another, non-R related, list, some light bulbs went on regarding the assumptions of what type of statistical test is going to be used with various power/sample size calculators/functions for proportions. In some cases, this is clearly stated, in others, it is not. In the case of power.prop.test() and comparing outputs against other calculators, there appears to be an implied presumption that an un-corrected chi-square test will be used, as opposed to a corrected chi-square or Fisher Exact Test (FET), in the 2x2 case. Sample sizes for the un-corrected chi-square will generally be smaller than either the corrected chi-square or the FET, given similar inputs, where the latter two, not surprisingly given their common conservative bias, will yield similar sample size results. This is not explicitly documented in ?power.prop.test, though it is in some other applications, as noted above. As a particular example from the other discussions, using p1 = 0.142, p2 = 0.266, with power = 0.8 and sig.level = 0.05, power.prop.test() yields a sample size of ~165 per group. Other calculators that presume either a corrected chi-square or the FET, yield ~180 per group. I raise this issue, as should one use the function to calculate a prospective sample size for a study, and then actually use a corrected chi-square to analyze the data, per routine use and/or a formal analysis plan, the power of that test will be lower than that which was presumed for the a priori calculation. It may not make a big difference in some proportion of the cases relative to p <= alpha, but given the idiosyncrasies of the observed data at the end of the study, along with the effective loss of some power, it may very well be relevant to the results and their strict interpretation. It may also impact, to some extent, the a priori planning for the study, relative to the needed target sample size, budgeting and other considerations for a study sponsor. Is there any logic in adding some notes to ?power.prop.test, to indicate the implied presumption of the use of an un-corrected chi-square test? Thanks for any comments, including telling me that I need more caffeine and to increase my oxygen uptake... Regards, Marc Schwartz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] quantile() names
CITED TEXT CONTAINS EXCERPTS ONLY > and now we read more replies on this topic without anyone looking at > the pure R source code which is pretty simple and easy. > Instead, people do experiments and take time to muse about their findings.. > Honestly, I'm disappointed: I've always thought that if you > *write* on R-devel, you should be able to figure out a few > things yourself before that.. That's a bit unfair. Some of us have written packages, containing functions for computing quantile names: probhat::ntile.names (,100) > 1) provide an optional argument 'digits = 7' >back compatible w/ default getOption("digits") I'm not sure I've got this right. Are you suggesting that by default, names should have 7 digits? > so I'm guessing it may make more people unhappy than other > people happy if we change this now, after close to 23 years .. ?? I would probably be in the less enthusiastic group. I take the view that quantile naming is mainly a convenience, for summary-style output. And on that basis, I would say the current behaviour is about right. Anyone looking for high precision, should probably compute their own quantile names. Also, expanding on an earlier point. The value was 975.025, so a label of "97.5%" could still cause problems. Increasing the precision doesn't necessarily fix this sort of problem. But rather, increases the complexity of the output, beyond what "97.5%" of users would ever want... B. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] quantile() names
Sorry, I need to change my last post. I looked at this a bit more, and realized that increasing the (max) number of (name) digits is only relevant in some cases. For people computing quartiles and deciles, this shouldn't make any difference. Therefore, should still be convenient for the purposes of summary-style output. On Thu, Dec 17, 2020 at 11:48 AM Abby Spurdle wrote: > > CITED TEXT CONTAINS EXCERPTS ONLY > > > and now we read more replies on this topic without anyone looking at > > the pure R source code which is pretty simple and easy. > > Instead, people do experiments and take time to muse about their findings.. > > Honestly, I'm disappointed: I've always thought that if you > > *write* on R-devel, you should be able to figure out a few > > things yourself before that.. > > That's a bit unfair. > Some of us have written packages, containing functions for computing > quantile names: > > probhat::ntile.names (,100) > > > > 1) provide an optional argument 'digits = 7' > >back compatible w/ default getOption("digits") > > I'm not sure I've got this right. > Are you suggesting that by default, names should have 7 digits? > > > > so I'm guessing it may make more people unhappy than other > > people happy if we change this now, after close to 23 years .. ?? > > I would probably be in the less enthusiastic group. > I take the view that quantile naming is mainly a convenience, for > summary-style output. > > And on that basis, I would say the current behaviour is about right. > Anyone looking for high precision, should probably compute their own > quantile names. > > > Also, expanding on an earlier point. > The value was 975.025, so a label of "97.5%" could still cause problems. > Increasing the precision doesn't necessarily fix this sort of problem. > But rather, increases the complexity of the output, beyond what > "97.5%" of users would ever want... > > > B. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel