Re: [Rd] quantile() names

2020-12-16 Thread Martin Maechler
> Gabriel Becker 
> on Mon, 14 Dec 2020 13:23:00 -0800 writes:

> Hi Edgar, I certainly don't think quantile(x, .975) should
> return 980, as that is a completely wrong answer.

> I do agree that it seems like the name is a bit
> offputting. I'm not sure how deep in the machinery you'd
> have to go to get digits to no effect on the names (I
> don't have time to dig in right this second).

> On the other hand, though, if we're going to make the
> names not respect digits entirely, what do we do when
> someone does quantile(x, 1/3)? That'd be a bad time had by
> all without digits coming to the rescue, i think.

> Best, ~G

and now we read more replies on this topic without anyone looking at
the pure R source code which is pretty simple and easy.
Instead, people do experiments and take time to muse about their findings..

Honestly, I'm disappointed: I've always thought that if you
*write* on R-devel, you should be able to figure out a few
things yourself before that..

It's not rocket science to see/know that you need to quickly look at
the quantile.default() method function and then to note 
that it's  format_perc(.) which is used to create the names.

Almost surely, I've been a bit envolved in creating parts of
this and probably am responsible for the current default
behavior.

   
   (sounds of digging) ...
   
   
   
   
   
   

--> Yes:


r837 | maechler | 1998-03-05 12:20:37 +0100 (Thu, 05. Mar 1998) | 2 Zeilen
GeƤnderte Pfade:
   M /trunk/src/library/base/R/quantile
   M /trunk/src/library/base/man/quantile.Rd

fixed names(.) construction


With this diff  (my 'svn-diffB -c837 quantile') :
Index: quantile
===
21c21,23
<   names(qs) <- paste(round(100 * probs), "%", sep = "")
---
>   names(qs) <- paste(formatC(100 * probs, format= "fg", wid=1,
>  dig= max(2,.Options$digits)),
>  "%", sep = "")

-
so this was before this was modularized into the format_perc()
utility and quite a while before R 1.0.0 

Now, 22.8 years later, I do think that indeed it was not
necessarily the best idea to make the names() construction depend  on the
'digits' option entirely and just protect it by using at least 2 digits.

What I think is better is to

1) provide an optional argument   'digits = 7'
   back compatible w/ default getOption("digits")

2) when used, check that it is at least '1'

But then some scripts / examples of some people *will* change
..., e.g., because they preferred to have a global setting of digits=5

so I'm guessing it may make more people unhappy than other
people happy if we change this now, after close to 23 years  .. ??

Martin

--
Martin Maechler
ETH Zurich  and  R Core team


> On Mon, Dec 14, 2020 at 11:55 AM Merkle, Edgar
> C.  wrote:

>> All,
>> 
>> Consider the code below
>> 
>> options(digits=2)
>>  x <- 1:1000 
>> quantile(x, .975)

>> The value returned is 975 (the 97.5th percentile), but
>> the name has been shortened to "98%" due to the digits
>> option. Is this intended? I would have expected the name
>> to also be "97.5%" here. Alternatively, the returned
>> value might be 980 in order to match the name of "98%".
>> 
>> Best, Ed
>>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] power.prop.test() documentation question

2020-12-16 Thread Marc Schwartz via R-devel
Hi All,

Based upon a discussion on power/sample size calculations on another, non-R 
related, list, some light bulbs went on regarding the assumptions of what type 
of statistical test is going to be used with various power/sample size 
calculators/functions for proportions. In some cases, this is clearly stated, 
in others, it is not.

In the case of power.prop.test() and comparing outputs against other 
calculators, there appears to be an implied presumption that an un-corrected 
chi-square test will be used, as opposed to a corrected chi-square or Fisher 
Exact Test (FET), in the 2x2 case. Sample sizes for the un-corrected chi-square 
will generally be smaller than either the corrected chi-square or the FET, 
given similar inputs, where the latter two, not surprisingly given their common 
conservative bias, will yield similar sample size results. 

This is not explicitly documented in ?power.prop.test, though it is in some 
other applications, as noted above. 

As a particular example from the other discussions, using p1 = 0.142, p2 = 
0.266, with power = 0.8 and sig.level = 0.05, power.prop.test() yields a sample 
size of ~165 per group. Other calculators that presume either a corrected 
chi-square or the FET, yield ~180 per group. 

I raise this issue, as should one use the function to calculate a prospective 
sample size for a study, and then actually use a corrected chi-square to 
analyze the data, per routine use and/or a formal analysis plan, the power of 
that test will be lower than that which was presumed for the a priori 
calculation. It may not make a big difference in some proportion of the cases 
relative to p <= alpha, but given the idiosyncrasies of the observed data at 
the end of the study, along with the effective loss of some power, it may very 
well be relevant to the results and their strict interpretation. It may also 
impact, to some extent, the a priori planning for the study, relative to the 
needed target sample size, budgeting and other considerations for a study 
sponsor.

Is there any logic in adding some notes to ?power.prop.test, to indicate the 
implied presumption of the use of an un-corrected chi-square test? 

Thanks for any comments, including telling me that I need more caffeine and to 
increase my oxygen uptake...

Regards,

Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] quantile() names

2020-12-16 Thread Abby Spurdle
CITED TEXT CONTAINS EXCERPTS ONLY

> and now we read more replies on this topic without anyone looking at
> the pure R source code which is pretty simple and easy.
> Instead, people do experiments and take time to muse about their findings..
> Honestly, I'm disappointed: I've always thought that if you
> *write* on R-devel, you should be able to figure out a few
> things yourself before that..

That's a bit unfair.
Some of us have written packages, containing functions for computing
quantile names:

 probhat::ntile.names (,100)


> 1) provide an optional argument   'digits = 7'
>back compatible w/ default getOption("digits")

I'm not sure I've got this right.
Are you suggesting that by default, names should have 7 digits?


> so I'm guessing it may make more people unhappy than other
> people happy if we change this now, after close to 23 years  .. ??

I would probably be in the less enthusiastic group.
I take the view that quantile naming is mainly a convenience, for
summary-style output.

And on that basis, I would say the current behaviour is about right.
Anyone looking for high precision, should probably compute their own
quantile names.


Also, expanding on an earlier point.
The value was 975.025, so a label of "97.5%" could still cause problems.
Increasing the precision doesn't necessarily fix this sort of problem.
But rather, increases the complexity of the output, beyond what
"97.5%" of users would ever want...


B.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] quantile() names

2020-12-16 Thread Abby Spurdle
Sorry, I need to change my last post.

I looked at this a bit more, and realized that increasing the (max)
number of (name) digits is only relevant in some cases.
For people computing quartiles and deciles, this shouldn't make any difference.
Therefore, should still be convenient for the purposes of summary-style output.


On Thu, Dec 17, 2020 at 11:48 AM Abby Spurdle  wrote:
>
> CITED TEXT CONTAINS EXCERPTS ONLY
>
> > and now we read more replies on this topic without anyone looking at
> > the pure R source code which is pretty simple and easy.
> > Instead, people do experiments and take time to muse about their findings..
> > Honestly, I'm disappointed: I've always thought that if you
> > *write* on R-devel, you should be able to figure out a few
> > things yourself before that..
>
> That's a bit unfair.
> Some of us have written packages, containing functions for computing
> quantile names:
>
>  probhat::ntile.names (,100)
>
>
> > 1) provide an optional argument   'digits = 7'
> >back compatible w/ default getOption("digits")
>
> I'm not sure I've got this right.
> Are you suggesting that by default, names should have 7 digits?
>
>
> > so I'm guessing it may make more people unhappy than other
> > people happy if we change this now, after close to 23 years  .. ??
>
> I would probably be in the less enthusiastic group.
> I take the view that quantile naming is mainly a convenience, for
> summary-style output.
>
> And on that basis, I would say the current behaviour is about right.
> Anyone looking for high precision, should probably compute their own
> quantile names.
>
>
> Also, expanding on an earlier point.
> The value was 975.025, so a label of "97.5%" could still cause problems.
> Increasing the precision doesn't necessarily fix this sort of problem.
> But rather, increases the complexity of the output, beyond what
> "97.5%" of users would ever want...
>
>
> B.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel