Many thanks to Greg L. Snow and David Winsemius for their responses.  

First off I can safely say I don't know enough statistics to be dangerous, but 
hopefully I will get to that point:) 

Regarding the goal - ultimately I would like to use linear regression 
(constrained for using linear regression at this point) for my data.  I thought 
the requirements for using linear regression was the following (I pulled this 
list from 
www.utexas.edu/courses/schwab/sw318_spring_2004/SolvingProblems/Class27_RegressionNCorrHypoTest.ppt):

The assumptions required for utilizing a regression equation are the same as 
the assumptions for the test of significance of a correlation coefficient.
Both variables are interval level.
Both variables are normally distributed.
The relationship between the two variables is linear.
The variance of the values of the dependent variable is uniform for all values 
of the independent variable (equality of variance).

Thus, I was going to attempt to (1) identify which distribution my data most 
closely represents, (2) translate my data so that it is normal, and (3) then 
use linear regression on the data.  

However, if 
"The assumptions of most regression methods is that the *errors* need to have 
the desired relationship between means and variance, and not that the dependent 
variable be "normal". Many times the apparent non-normality will be "explained" 
or "captured" by the regression model."

Does this mean I can just "do" linear regression without translating my data 
and it will be okay?  

Note that I was using "lm" from R to access the errors, however, I had not an 
opportunity to do much analysis of those results to determine if they are 
Gaussian or not.   

I guess I am going to try to track down the following documents:
(1) Statistical Distributions (Paperback)
by Merran Evans (Author), Nicholas Hastings (Author), Brian Peacock (Author) 
# ISBN-10: 0471371246
# ISBN-13: 978-0471371243

(2) Regression Modeling Strategies (Hardcover)
by Frank E. Jr. Harrell (Author)
# ISBN-10: 0387952322
# ISBN-13: 978-0387952321

Maybe electronic versions of those documents are available.  My wife is already 
giving me a hard time the volume of books around.   

Thank you again for all your feedback and insights.  


--- On Fri, 2/13/09, David Winsemius <dwinsem...@comcast.net> wrote:
From: David Winsemius <dwinsem...@comcast.net>
Subject: Re: [R] Website, book, paper, etc. that shows example plots of  
distributions?
To: jasonkrup...@yahoo.com
Cc: "Gabor Grothendieck" <ggrothendi...@gmail.com>, R-help@r-project.org
Date: Friday, February 13, 2009, 9:10 AM

This is probably the right time to issue a warning about the error of making
transformations on the dependent variable before doing your analysis. The
classic error that newcomers to statistics commit is to decide that they want to
"make their data normal". The assumptions of most regression methods
is that the *errors* need to have the desired relationship between means and
variance, and not that the dependent variable be "normal". Many times
the apparent non-normality will be "explained" or "captured"
by the regression model. Other methods of modeling non-linear dependence are
also available.

I found Harrell's book "Regression Modeling Strategies" to be an
excellent source for alternatives. My copy of V&R's MASS is only the
second edition but chapters 5 & 6 in that edition on linear models also had
examples of using QQ plots on residuals. Checking that text's website I see
that chapters 6 at least is probably similar. They include the scripts from
their chapters along with the MASS package (installed as part of the VR bundle).
My copy is entitled "ch06.r" and resides in the scripts subdirectory:
/Library/Frameworks/R.framework/Versions/2.8/Resources/library/MASS/scripts/ch06.R

--David Winsemius


On Feb 13, 2009, at 8:11 AM, Jason Rupert wrote:

> Thank you very much.  Thank you again regarding the suggestion below.  I
will give that a shot and I guess I've got my work counted out for me.  I
counted 45 different distributions.
> 
> Is the best way to get a QQPlot of each, to run through producing a data
set for each distribution and then using the qqplot function to get a QQplot of
the distribution and then compare it with my data distribution?
> 
> As you can tell I am not a trained statistician, so any guidance or
suggested further reading is greatly appreciated.
> 
> I guess I am pretty sure my data is not a normal distribution due to doing
some of the empirical "Goodness of Fit" tests and comparing the QQplot
of my data against the QQPlot of a normal distribution with the same number of
points.  I guess the next step is to figure out which distribution my data most
closely matches.
> 
> Also, I guess I could also fool around and take the log, sqrt, etc. of my
data and see if it will then more closely resemble a normal distribution.
> 
> Thank you again for assisting this novice data analyst who is trying to
gain a better understanding of the techniques using this powerful software
package.
> 
> 
> 
> 
> --- On Fri, 2/13/09, Gabor Grothendieck <ggrothendi...@gmail.com>
wrote:
> From: Gabor Grothendieck <ggrothendi...@gmail.com>
> Subject: Re: [R] Website, book, paper, etc. that shows example plots of 
distributions?
> To: jasonkrup...@yahoo.com
> Cc: R-help@r-project.org
> Date: Friday, February 13, 2009, 5:43 AM
> 
> You can readily create a dynamic display for using qqplot and similar
functions
> in conjunction with either the playwith or TeachingDemos packages.
> 
> For example, to investigate the effect of the shape parameter in the skew
> normal distribution on its qqplot relative to the normal distribution:
> 
>   library(playwith)
>   library(sn)
>   playwith(qqnorm(rsn(100, shape = shape)),
>       parameters = list(shape = seq(-3, 3, .1)))
> 
> Now move the slider located at the bottom of the window that
> appears and watch the plot change in response to changing
> the shape value.
> 
> You can find more distributions here:
> http://cran.r-project.org/web/views/Distributions.html
> 
> On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert
<jasonkrup...@yahoo.com>
> wrote:
>> By any chance is any one aware of a website, book, paper, etc. or
> combinations of those sources that show plots of different distributions?
>> 
>> After reading a pretty good whitepaper I became aware of the benefit
of I
> the benefit of doing Q-Q plots and histograms to help assess a
distribution.
> The whitepaper is called:
>> "Univariate Analysis and Normality Test Using SAS, Stata, and
> SPSS*" , (c) 2002-2008 The Trustees of Indiana University Univariate
> Analysis and Normality Test: 1, Hun Myoung Park
>> 
>> Unfortunately the white paper does not provide an extensive amount of
> example distributions plotted using Q-Q plots and histograms, so I am
curious if
> there is a "portfolio"-type  website or other whitepaper shows
> examples of various types of distributions.
>> 
>> It would be helpful to see a bunch of Q-Q plots and their associated
> histograms to get an idea of how the distribution looks in comparison
against
> the Gaussian.
>> 
>> I think seeing the plot really helps.
>> 
>> Thank you for any insights.
>> 
>> 
>> 
>>       [[alternative HTML version deleted]]
>> 
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
> 
> 
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




      
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to