See at end. On 13-Jul-08 21:42:19, Johannes Huesing wrote: > Ted Harding <[EMAIL PROTECTED]> [Sun, Jul 13, 2008 at > 10:59:21PM CEST]: >> On 13-Jul-08 19:53:47, Johannes Huesing wrote: >> > Frank E Harrell Jr <[EMAIL PROTECTED]> [Sun, Jul 13, 2008 at >> > 08:07:37PM CEST]: >> >> (Ted Harding) wrote: >> >>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: >> >>>> [...] >> >>>> A large P-value means nothing more than needing more data. No >> >>>> conclusion is possible. > [...] > >> But "absence >> of evidence", in my interpretation (which I believe is right for >> the statistical context of "non-significant P-values"), means that >> we do not know about A: we do not have enough information. >> > > What would the p-value have to be like in your opinion to make the > null hypothesis look more likely after the experiment than before? > >> The proof is, basically, given in terms of a 2-valued logic where >> every term is either TRUE or FALSE. In the real world we have at >> least a third possible value: UNKNOWN (or, as R would put it, NA). > > How would the probabilities that A is NA be affected by the outcome > of an experiment like this? If this probability is affected, how > does this leave the probability that A is T or F unaffected? > > Or do you assign the NA status to the data collected? > > A high p-value does not always equate that you might as well have > collected nothing but missing values. > > Of course I buy into the notion that a point estimate with a measure > of accuracy is much better suited to describe your data; but a > high p-value as a result of a test procedure that can be claimed to > be adequately powered may defensibly be taken as a hint that we > can for now stick with the null hypothesis. > -- > Johannes Hüsing
I shall perhaps try later to respond in more detail to specific points above. But, for the moment, let me say that I think your statement "a high p-value as a result of a test procedure that can be claimed to be adequately powered may defensibly be taken as a hint that we can for now stick with the null hypothesis" is the main key. The power function of a test (which of course depends on the design of the investigation and on its size, i.e. number of data gathered) is basically much the same (in my mind) as the amount of evidence. A high P-value with a very powerful test serves to exclude all alternatives to the Null Hypothesis except those which lie very close to the Null Hypothesis. In that sense, we do in fact have a lot of evidence against all hypotheses except those which are very similar to the Null. So we are not in an "absence of evidence" situation, and we do have "evidence of absence". The basic logic of a Hypothesis Test (in its standard sense) is the generalisation, to a logic where certainty is at best probabilistic, of the classical-logic argument: Given (as a matter of fact): If A, then B Observed: B is FALSE Conclusion: A is FALSE Probabilistically: Given: If A (H0), then B has high probability Observed: B is FALSE Conclusion: An event (not-B) has occurred which has very small probability if A is TRUE. Hence we (as George Barnard used to put it) apply "The Principle of Disbelief in Tall Stories" and disbelieve A to the extent that we disbelieve not-B as a possible outcome from A (H0). In applications, the event B will be specified in terms of a set of possible values of a Test Statistic T, devised so as to represent an interesting measure of discrepancy between the data and the hypothesis H0 (e.g. the t-statistic for testing whether two samples are drawn from populations with equal means -- if that is the case, then E(T) = 0, and the set of values {abs(T) > T0} will be a "discrepant set". By choosing T0 to be such that Prob(abs(T) > T0) = p0, a small value which we choose to suit ourselves, we are defining the threshold at which we are prepared to deem that "the claim that Abs(T) > T0 is compatible with H0" is too unlikely to be plausible. The cleanest example in real life can be drawn from the basic principle in criminal law for concluding that an accused person is guilty, namely "The accused is deemed innocent until proved guilty beyond reasonable doubt". What constitutes "reasonable doubt" can become a very interesting question, but there are some crimes for which it has a definite statistical interpretation, typically exceeding some authorised limit (of speed in a vehicle, of alcohol content in the blood while driving a vehicle, of a factory plant exceeding permitted levels of polluting emissions [which in the UK, under the Environmental Protection Act, is a criminal offence]. In the days when blood alcohol was determined by laboratory analysis of a blood sample, it was possible to determine that the "margin of error" corresponded to a P-value less than or equal to 0.001 (i.e. if the lab analysis yielded a result in exceess of the legal limit + 2*SE, then the inevitable result was a conviction unless it could be independently proved in defence that the statutory procedures were carried out in a flawed manner). So, in that case, "beyond reasonable doubt" meant "The P-value of the data was less that 1/1000". But, if the lab analysis gave 80mg/100ml (the legal limit in the UK), then at best you can conclude that the result equally favoured any two hypotheses equidistant on either side of the legal limit. But while this constitutes (in the sense explained) absence of evidence for guilt (i.e. alc > 80), it certainly does not exclude it (someone at 81, and therefore truly guilty, could be quite likely to give a result of 80). So the "80" result is not evidence of innocence -- it is merely lack of evidence of guilt. It gets worse with the environmental pollution situation. For the blood alcohol and the lab analysis of a blood sample, the lab procedure is only legally valid if it consistently achieves an SE of determination of 2% or less (taken as 2mg/100ml for results below 100). Thus the power function has Power(alc) = 0.001 at alc=80, Power(alc) = 0.5 at alc=86, Power(alc) = 0.999 at alc=92. Thus the innocent (alc <= 80) have a good protection against false conviction; the marginally guilty (alc < 86, say) are likely to get away with it; the seriously guilty (alc > 92) are almost certain to be convicted. However, the kinds of measurement which can be made of, say, atmospheric pollution are subject to SEs which are more like 20% and are often higher (50% or more). To achieve the requisite "beyond reasonable doubt" (since it is a criminal offence) on the same criterion (3*SE above) means that the procedure is only effective when the emission is say twice the permitted level (or even more). Here we have lack of evidence in a very real sense (the procedure is weak). It would be quite possible for a polluter emit well above the permitted level, yet the sampling give a result well below the permitted level. Hence, such absence of evidence is certainly not evidence of absence. And, if I understand correctly, this is pretty much what Frank Harrell meant when he wrote "A large P-value means nothing more than needing more data. No conclusion is possible. Please read the classic paper Absence of Evidence is not Evidence for Absence." [Or "better data", one might add]. But it does need to be qualified (as I try to do above) by consideration of whereabouts on the "effect" scale the procedure becomes capable of doing its job, which in turn brings in issues about the importance (in real life) of the sort of departure from H0 that it is important to detect. The blood-alcohol test does a reasonably good job (one is prepared to accept a relatively narrow "grey area" where any conclusion is unclear). The pollution test does not. Mustn't go on too long! Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 14-Jul-08 Time: 00:16:50 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.