On Jun 21, 2011, at 3:49 AM, George Markomanolis wrote:
Dear all,
I am new to this field and I have a question about a linear
regression.
I have a dataset of around to 31000 points and I want to apply a
linear
regression. The R-squared is 0.9 however when I check the diagnostic
plots I can see that there are around to 250 points with big leverage
value. As I know the points with big leverage influence a lot the fit.
If I remove these points in order to check their influence, the
R-squared of the rest points is 0.71. So I removed less than 1% of my
data and the fit is not so good. Could you please give me any advice
about this? Is it right to let these 250 points in my dataset or not?
Could I do something else? The data are measured through an experiment
so even these 250 points are real values.
You could be looking at the descriptive statistics on the points.
Perhaps they are at one end of a variable range, or you perhaps have
some other feature that is scientifically interesting. So far you have
only been examining one set of simple linear hypotheses and have not
(presumably) been looking at any non-linear possibilities or the
potential that interactions are affecting the outcome. The prior
science of your (so far undescribed) domain should be carefully
considered, but in your message we see no evidence of such.
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.