[Rd] Printing the null hypothesis

2009-08-16 Thread Liviu Andronic
Dear R developers,
Currently many (all?) test functions in R describe the alternative
hypothesis, but not the the null hypothesis being tested. For example,
cor.test:
> require(boot)
> data(mtcars)
> with(mtcars, cor.test(mpg, wt, met="kendall"))

Kendall's rank correlation tau

data:  mpg and wt
z = -5.7981, p-value = 0.6706
alternative hypothesis: true tau is not equal to 0
sample estimates:
 tau
-0.72783

Warning message:
In cor.test.default(mpg, wt, met = "kendall") :
  Cannot compute exact p-value with ties


In this example,
H0: (not printed)
Ha: true tau is not equal to 0

This should be fine for the advanced users and expert statisticians,
but not for beginners. The help page will also often not explicitely
state the null hypothesis. Personally, I often find myself in front of
an htest object guessing what the null should have reasonably sounded
like.

Are there compelling reasons for not printing out the null being
tested, along with the rest of the results? Thank you
Liviu





-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Printing the null hypothesis

2009-08-16 Thread Ted Harding
On 16-Aug-09 10:38:40, Liviu Andronic wrote:
> Dear R developers,
> Currently many (all?) test functions in R describe the alternative
> hypothesis, but not the the null hypothesis being tested. For example,
> cor.test:
>> require(boot)
>> data(mtcars)
>> with(mtcars, cor.test(mpg, wt, met="kendall"))
> 
>   Kendall's rank correlation tau
> 
> data:  mpg and wt
> z = -5.7981, p-value = 0.6706
> alternative hypothesis: true tau is not equal to 0
> sample estimates:
>  tau
> -0.72783
> 
> Warning message:
> In cor.test.default(mpg, wt, met = "kendall") :
>   Cannot compute exact p-value with ties
> 
> 
> In this example,
> H0: (not printed)
> Ha: true tau is not equal to 0
> 
> This should be fine for the advanced users and expert statisticians,
> but not for beginners. The help page will also often not explicitely
> state the null hypothesis. Personally, I often find myself in front of
> an htest object guessing what the null should have reasonably sounded
> like.
> 
> Are there compelling reasons for not printing out the null being
> tested, along with the rest of the results? Thank you
> Liviu

I don't know about *compelling* reasons! But (as a general rule)
if the Alternative Hyptohesis is stated, then the Null Hypothesis
is simply its negation. So, in your example, you can infer

  H0: true tau equals 0
  Ha: true tau is not equal to 0.

I don't think one needs to be an advanced user or expert statistician
to see this -- it is part of the basic understanding of hypothesis
testing.

Some people might regard the "H0" statement as simply redundant!

The "Ha" statement is, however, essential, since different alterntatives
may ne adopted depending on the application such as

  Ha: true tau is greater than 0
  (implicit: true tau <= 0)

or
  Ha: tru tau is less than 0
  (implicit: true tau >= 0)

Hoping this helps,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-09   Time: 11:55:15
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Printing the null hypothesis

2009-08-16 Thread Liviu Andronic
Hello,

On 8/16/09, Ted Harding  wrote:
> I don't know about *compelling* reasons! But (as a general rule)
>  if the Alternative Hyptohesis is stated, then the Null Hypothesis
>  is simply its negation. So, in your example, you can infer
>
>   H0: true tau equals 0
>   Ha: true tau is not equal to 0.
>
Oh, I had a slightly different H0 in mind. In the given example,
cor.test(..., met="kendall") would test "H0: x and y are independent",
but cor.test(..., met="pearson") would test: "H0: x and y are not
correlated (or `are linearly independent')" .

To take a different example, a test of normality.
> shapiro.test(mtcars$wt)

Shapiro-Wilk normality test

data:  mtcars$wt
W = 0.9433, p-value = 0.09265

Here both "H0: x is normal" and "Ha: x is not normal" are missing. At
least to beginners, these things are not always perfectly clear (even
after reading the documentation), and when interpreting the results it
can prove useful to have on-screen information about the null.


Thank you for answering
Liviu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Printing the null hypothesis

2009-08-16 Thread Ted Harding
On 16-Aug-09 14:06:18, Liviu Andronic wrote:
> Hello,
> On 8/16/09, Ted Harding  wrote:
>> I don't know about *compelling* reasons! But (as a general rule)
>>  if the Alternative Hyptohesis is stated, then the Null Hypothesis
>>  is simply its negation. So, in your example, you can infer
>>
>>   H0: true tau equals 0
>>   Ha: true tau is not equal to 0.
>>
> Oh, I had a slightly different H0 in mind. In the given example,
> cor.test(..., met="kendall") would test "H0: x and y are independent",
> but cor.test(..., met="pearson") would test: "H0: x and y are not
> correlated (or `are linearly independent')" .

Ah, now you are playing with fire! What the Pearson, Kendall and
Spearman coefficients in cor.test measure is *association*. OK, if
the results clearly indicate association, then the variables are
not independent. But it is possible to have two variables x, y
which are definitely not independent (indeed one is a function of
the other) which yield zero association by any of these measures.

Example:
  x <-  (-10:10) ; y <- x^2 - mean(x^2)
  cor.test(x,y,method="pearson")
  #   Pearson's product-moment correlation
  # t = 0, df = 19, p-value = 1
  # alternative hypothesis: true correlation is not equal to 0 
  # sample estimates: cor 0
  cor.test(x,y,method="kendall")
  #   Kendall's rank correlation tau
  # z = 0, p-value = 1
  # alternative hypothesis: true tau is not equal to 0 
  # sample estimates: tau 0
  # cor.test(x,y,method="spearman")
  #  Spearman's rank correlation rho
  # S = 1540, p-value = 1
  # alternative hypothesis: true rho is not equal to 0 
  # sample estimates: rho 0

If you wanted, for instance, that the "method=kendall" should
announce that it is testing "H0: x and y are independent" then
it would seriously mislead the reader!

> To take a different example, a test of normality.
>> shapiro.test(mtcars$wt)
> 
>   Shapiro-Wilk normality test
> 
> data:  mtcars$wt
> W = 0.9433, p-value = 0.09265
> 
> Here both "H0: x is normal" and "Ha: x is not normal" are missing. At
> least to beginners, these things are not always perfectly clear (even
> after reading the documentation), and when interpreting the results it
> can prove useful to have on-screen information about the null.
> 
> Thank you for answering
> Liviu

This is possibly a more discussable point, in that even if you know
what the Shapiro-Wilk statistic is, it is not obvious what it is
sensitive to, and hence what it might be testing for. But I doubt
that someone would be led to try the Shapiro-Wilk test in the
first place unless they were aware that it was a test for normality,
and indeded this is announced in the first line of the response.
The alternative, therefore, is "non-normality".

As to the contrast between absence of an "Ha" statement for the
Shapiro-Wilk, and its presence in cor,test(), this comes back to
the point I made earlier: cot.test() offers you three alternatives
to choose from: "two-sided" (default), "greater", "less". This
distinction can be important, and when cor.test() reports "Ha" it
tells you which one was used.

On the other hand, as far as Shapiro-Wilk is concerned there is
no choice of alternatives (nor of anything else except the data x).
So there is nothing to tell you! And, further, departure from
normality has so many "dimensions" that alternatives like "two
sided", "greater" or "less" would make no sense. One can think of
tests targeted at specific kinds of alternative such as "Distribution
is excessively skew" or "distribution has excessive kurtosis" or
"distribution is bimodal" or "distribution is multimodal", and so on.
But any of these can be detected by Shapiro-Wilk, so it is not
targeted at any specific alternative.

Best wishes,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-09   Time: 16:26:48
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Printing the null hypothesis

2009-08-16 Thread Liviu Andronic
On 8/16/09, Ted Harding  wrote:
>  > Oh, I had a slightly different H0 in mind. In the given example,
>  > cor.test(..., met="kendall") would test "H0: x and y are independent",
>  > but cor.test(..., met="pearson") would test: "H0: x and y are not
>  > correlated (or `are linearly independent')" .
>
>
> Ah, now you are playing with fire! What the Pearson, Kendall and
>  Spearman coefficients in cor.test measure is *association*. OK, if
> the results clearly indicate association, then the variables are
>  not independent. But it is possible to have two variables x, y
>  which are definitely not independent (indeed one is a function of
>  the other) which yield zero association by any of these measures.
>
>  Example:
>   x <-  (-10:10) ; y <- x^2 - mean(x^2)
>   cor.test(x,y,method="pearson")
>   #   Pearson's product-moment correlation
>   # t = 0, df = 19, p-value = 1
>   # alternative hypothesis: true correlation is not equal to 0
>   # sample estimates: cor 0
>   cor.test(x,y,method="kendall")
>
>   #   Kendall's rank correlation tau
>
>   # z = 0, p-value = 1
>   # alternative hypothesis: true tau is not equal to 0
>   # sample estimates: tau 0
>   # cor.test(x,y,method="spearman")
>   #  Spearman's rank correlation rho
>   # S = 1540, p-value = 1
>   # alternative hypothesis: true rho is not equal to 0
>   # sample estimates: rho 0
>
>  If you wanted, for instance, that the "method=kendall" should
>  announce that it is testing "H0: x and y are independent" then
>  it would seriously mislead the reader!
>
I did take the null statement from the description of
Kendall::Kendall() ("Computes the Kendall rank correlation and its
p-value on a two-sided test of H0: x and y are independent."). Here,
perhaps "monotonically independent" (as opposed to "functionally
independent") would have been more appropriate.

Still, this very example seems to support my original idea: users can
easily get confused on what is the exact null of a test. Does it test
for "association" or for "no association", for "normality" or for
"lack of normality" . Printing a precise and appropriate statement of
the null would prove helpful in interpreting the results, and in
avoiding misinterpreting these.



>  > Here both "H0: x is normal" and "Ha: x is not normal" are missing. At
>  > least to beginners, these things are not always perfectly clear (even
>  > after reading the documentation), and when interpreting the results it
>  > can prove useful to have on-screen information about the null.
>
> This is possibly a more discussable point, in that even if you know
>  what the Shapiro-Wilk statistic is, it is not obvious what it is
>  sensitive to, and hence what it might be testing for. But I doubt
>  that someone would be led to try the Shapiro-Wilk test in the
>  first place unless they were aware that it was a test for normality,
>  and indeded this is announced in the first line of the response.
>  The alternative, therefore, is "non-normality".
>
To be particularly picky, as statistics is, this is not so obvious
from the print-out. For the Shapiro-Wilk test one could indeed deduce
that since it is a "test of normality", then the null tested is "H0:
data is normal". This would not hold for, say, the Pearson
correlation. In loose language, it would estimate and test for
"correlation"; in more statistically appropriate language, it will
test for "no correlation" (or for "no association"). It feels to me
that without appropriate indicators, one can easily get playing with
fire.



>  As to the contrast between absence of an "Ha" statement for the
>  Shapiro-Wilk, and its presence in cor,test(), this comes back to
>  the point I made earlier: cot.test() offers you three alternatives
>  to choose from: "two-sided" (default), "greater", "less". This
>  distinction can be important, and when cor.test() reports "Ha" it
>  tells you which one was used.
>
>  On the other hand, as far as Shapiro-Wilk is concerned there is
>  no choice of alternatives (nor of anything else except the data x).
>  So there is nothing to tell you! And, further, departure from
>  normality has so many "dimensions" that alternatives like "two
>  sided", "greater" or "less" would make no sense. One can think of
>  tests targeted at specific kinds of alternative such as "Distribution
>  is excessively skew" or "distribution has excessive kurtosis" or
>  "distribution is bimodal" or "distribution is multimodal", and so on.
>  But any of these can be detected by Shapiro-Wilk, so it is not
>  targeted at any specific alternative.
>
Thank you for these explanations. Best
Liviu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] eurodist example dataset is malformed

2009-08-16 Thread Jari Oksanen
Justin,

I suggest you try to remove your malformed eurodist and use the one in R.
The svn logs show no changes in eurodist since 2005 when 'r' was added to
'Gibralta' (it still has all the wrong distances which perhaps go back to
the poor quality of Cambridge Encyclopaedia). I also installed R 2.9.1 for
MacOS to see that there neither is a change in 'eurodist' in the Mac
distribution. My virgin eurodist in Mac was clean, with all its errors. All
this hints that you have a local copy of malformed eurodist in your
computer. Perhaps

rm(eurodist)
eurodist

will help.

Cheers, Jari Oksanen


On 15/08/09 06:13 AM, "Justin Donaldson"  wrote:

> Here's my osx data/session info (identical after a re-install):
> 
>> class(eurodist)
> [1] "data.frame"
>> sessionInfo()
> R version 2.9.1 (2009-06-26)
> i386-apple-darwin8.11.1
> 
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>> 
> 
> -Justin
> 
> 
> 
> On Thu, Aug 13, 2009 at 4:48 AM, Gavin Simpson wrote:
> 
>> On Wed, 2009-08-12 at 20:26 -0400, Justin Donaldson wrote:
>>> The eurodist dataset (my favorite for mds) is malformed.  Instead of a
>>> standard distance matrix, it's a data frame.  The rownames have gotten
>>> 'bumped' to a new anonymous dimension "X".   It's possible to fix the
>> data,
>>> but it messes up a lot of example code out there.
>>> 
>>>   X Athens Barcelona Brussels Calais ...
>>> 1Athens  0  3313 2963   3175
>>> 2 Barcelona   3313 0 1318   1326
>>> 3  Brussels   2963  13180204
>>> 4Calais   3175  1326  204  0
>>> 5 Cherbourg   3339  1294  583460
>>> 6   Cologne   2762  1498  206409
>>> ...
>>> 
>>> Best,
>>> -Justin
>> 
>> What version of R, platform, loaded packages etc? This is not what I see
>> on Linux, 2.9.1-patched r49104.
>> 
>>> class(eurodist)
>> [1] "dist"
>>> sessionInfo()
>> R version 2.9.1 Patched (2009-08-07 r49104)
>> x86_64-unknown-linux-gnu
>> 
>> locale:
>> 
>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;
>> LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;
>> LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>> 
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods
>> base
>> 
>> loaded via a namespace (and not attached):
>> [1] tools_2.9.1
>> 
>> Have you tried this in a clean session to see if it persists there?
>> 
>> If you can reproduce this in a clean session with an up-to-date R or
>> R-Devel then send details of your R back to the list for further
>> investigation.
>> 
>> HTH
>> 
>> G
>> --
>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
>>  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
>>  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
>>  Gower Street, London  [w]
>> http://www.ucl.ac.uk/~ucfagls/
>>  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>> 
>> 
>> 
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R CMD check --use-valgrind doesn't run valgrind on tests

2009-08-16 Thread Charles Geyer
R CMD check --use-valgrind  used to run valgrind on the
tests in the tests directory of the package.  But it seems to have stopped.
R-2.9.1 doesn't -- at least on my box -- and neither does R-2.10.0 (devel).
I am not sure when this stopped.  I think 2.8.x did this.  The only old
R I have around is 2.6.0 and it certainly does.

R CMD check --help for 2.9.1 says (among other things)

--use-valgrinduse 'valgrind' when running examples/tests/vignettes

so the documentation seems to say that the old behavior should also be
the current behavior, but it isn't -- at least on my box.

oak$ cat /etc/SuSE-release
openSUSE 11.0 (X86-64)
VERSION = 11.0
oak$ valgrind --version
valgrind-3.3.0
oak$ gcc --version
gcc (SUSE Linux) 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036]

If this is just a stupid user problem and not a bug, how do I get the old
behavior (valgrind is run on tests).  BTW valgrind is run on examples
under 2.9.0, as cat .Rcheck/-Ex.Rout shows.
-- 
Charles Geyer
Professor, School of Statistics
University of Minnesota
char...@stat.umn.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel