Re: [R] sciplot question

Frank E Harrell Jr Mon, 25 May 2009 12:08:51 -0700

spencerg wrote:

Frank E Harrell Jr wrote:
spencerg wrote:
Dear Frank, et al.:
Frank E Harrell Jr wrote:
<snip>
Yes; I do see a normal distribution about once every 10 years.
     To what do you attribute the nonnormality you see in most cases?
(1) Unmodeled components of variance that can generateerrors in interpretation if ignored, even with bootstrapping?
(2) Honest outliers that do not relate to the phenomena ofinterest and would better be removed through improved checks on dataquality, but where bootstrapping is appropriate (provided the dataare not also contaminated with (1))?
(3) Situations where the physical application dictates adifferent distribution such as binomial, lognormal, gamma, etc.,possibly also contaminated with (1) and (2)?
I've fit mixtures of normals to data before, but one needs to becareful about not carrying that to extremes, as the mixture may be aresult of (1) and therefore not replicable.
George Box once remarked that he thought most designedexperiments included split plotting that had been ignored in theanalysis. That is only a special case of (1).
     Thanks,
     Spencer Graves
Spencer,
Those are all important reasons for non-normality of margindistributions. But the biggest reason of all is that the underlyingprocess did not know about the normal distribution. Normality in rawdata is usually an accident.
     Frank:
Might there be a difference between the physical and socialsciences on this issue?


Hi Spencer,

I doubt that the difference is large, but biological measurements seemto be more of a problem.

The central limit effect works pretty well with many kinds ofmanufacturing data, except that it is often masked by between-lotcomponents of variance. The first differences in log(prices) are oftenlong-tailed and negatively skewed. Standard GARCH and similar modelshandle the long tails well but miss the skewness, at least in what I'veseen. I think that can be fixed, but I have not yet seen it done.

The central limit theorem in and of itself doesn't help because itdoesn't tell you how large N must be before normality holds well enough.

Social science data, however, often involve discrete scales wherethe raters' interpretations of the scales rarely match any standarddistribution. Transforming to latent variables, e.g., via factoranalysis, may help but do not eliminate the problem.

Good example. Many of the scales I've seen are non-normal or evenmulti-modal.


     Thanks for your comments.


Thanks for yours
Frank

     Spencer


Frank



--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sciplot question

Reply via email to