On Feb 14, 2009, at 6:48 PM, Jason Rupert wrote: > Many thanks to Greg L. Snow and David Winsemius for their responses. > > First off I can safely say I don't know enough statistics to be > dangerous, but hopefully I will get to that point:) > > Regarding the goal - ultimately I would like to use linear > regression (constrained for using linear regression at this point) > for my data. I thought the requirements for using linear regression > was the following (I pulled this list from > www.utexas.edu/courses/schwab/sw318_spring_2004/SolvingProblems/Class27_RegressionNCorrHypoTest.ppt) > > : > > The assumptions required for utilizing a regression equation are the > same as the assumptions for the test of significance of a > correlation coefficient. > Both variables are interval level. > Both variables are normally distributed. > The relationship between the two variables is linear. > The variance of the values of the dependent variable is uniform for > all values of the independent variable (equality of variance). > > Thus, I was going to attempt to (1) identify which distribution my > data most closely represents, (2) translate my data so that it is > normal, and (3) then use linear regression on the data. > > However, if > "The assumptions of most regression methods is that the *errors* > need to have the desired relationship between means and variance, > and not that the dependent variable be "normal". Many times the > apparent non-normality will be "explained" or "captured" by the > regression model." > > Does this mean I can just "do" linear regression without translating > my data and it will be okay?
Not exactly. It does mean that you can "just do" linear regression but then check to see if "it was OK". The model will have the residuals in the regression object and these can be displayed with a scatterplot (versus the individual predictor variables) or as a QQ plot. > > > Note that I was using "lm" from R to access the errors, however, I > had not an opportunity to do much analysis of those results to > determine if they are Gaussian or not. > > I guess I am going to try to track down the following documents: > (1) Statistical Distributions (Paperback) > by Merran Evans (Author), Nicholas Hastings (Author), Brian Peacock > (Author) > # ISBN-10: 0471371246 > # ISBN-13: 978-0471371243 > > (2) Regression Modeling Strategies (Hardcover) > by Frank E. Jr. Harrell (Author) > # ISBN-10: 0387952322 > # ISBN-13: 978-0387952321 > > Maybe electronic versions of those documents are available. My wife > is already giving me a hard time the volume of books around. Frank Harrell's website has a lot of material that he makes available online; http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS snipped remainder -- David Winsemius [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.