Thankyou for your replies, you've answered my question and given me more to
think on. I guess it is unwise to draw any conclusions from the
standardised results for these reasons.
James.
--On 22 August 2011 17:30 +0100 ted.hard...@wlandres.net wrote:
On 22-Aug-11 15:37:40, JC Matthews wrote:
Hello,
I have a statistical problem that I am using R for, but I am
not making sense of the results. I am trying to use multiple
regression to explore which variables (weather conditions)
have the greater effect on a local atmospheric variable.
The data is taken from a database that has 20391 data points (Z1).
A simplified version of the data I'm looking at is given below,
but I have a problem in that there is a disagreement in sign
between the regression coefficients and the standardised regression
coefficients. Intuitively I would expect both to be the same sign,
but in many of the parameters, they are not.
I am aware that there is a strong opinion that using standardised
correlation coefficients is highly discouraged by some people,
but I would nevertheless like to see the results. Not least
because it has made me doubt the non-standardised values of B
that R has given me.
The code I have used, and some of the data, is as follows (once
the database has been imported from SQL, and outliers removed).
Z1sub <- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)]
colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad",
"mean1", "sd1" )
attach(Z1sub)
names(Z1sub)
Model1d <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + I(rain^2)
)
summary(Model1d)
Call:
lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) +
I(rain^2))
Residuals:
Min 1Q Median 3Q Max
-1230.64 -63.17 18.51 97.85 1275.73
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.243e+02 5.689e+01 -16.246 < 2e-16 ***
hum 2.835e+01 1.468e+00 19.312 < 2e-16 ***
wind 1.236e+02 4.832e+00 25.587 < 2e-16 ***
rain -3.144e+03 7.635e+02 -4.118 3.84e-05 ***
I(hum^2) -1.953e-01 9.393e-03 -20.793 < 2e-16 ***
I(wind^2) 6.914e-01 2.174e-01 3.181 0.00147 **
I(rain^2) 2.730e+02 3.265e+01 8.362 < 2e-16 ***
hum:wind -1.782e+00 5.448e-02 -32.706 < 2e-16 ***
hum:rain 2.798e+01 8.410e+00 3.327 0.00088 ***
wind:rain 6.018e+02 2.146e+02 2.805 0.00504 **
hum:wind:rain -6.606e+00 2.401e+00 -2.751 0.00594 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
' ' 1
Residual standard error: 180.5 on 20337 degrees of freedom
Multiple R-squared: 0.2394, Adjusted R-squared: 0.239
F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16
To calculate the standardised coefficients, I used the following:
Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind',
'press',
'rain', 's.rad', 'mean1', 'sd1' ) ] ) )
attach(Z1sub.scaled)
names(Z1sub.scaled)
Model1d.sc <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) +
I(rain^2) )
summary(Model1d.scaled)
Call:
lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) +
I(rain^2))
Residuals:
Min 1Q Median 3Q Max
-5.94713 -0.30527 0.08946 0.47287 6.16503
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0806858 0.0096614 8.351 < 2e-16 ***
hum -0.4581509 0.0073456 -62.371 < 2e-16 ***
wind -0.1995316 0.0073767 -27.049 < 2e-16 ***
rain -0.1806894 0.0158037 -11.433 < 2e-16 ***
I(hum^2) -0.1120435 0.0053885 -20.793 < 2e-16 ***
I(wind^2) 0.0172870 0.0054346 3.181 0.00147 **
I(rain^2) 0.0040575 0.0004853 8.362 < 2e-16 ***
hum:wind -0.2188729 0.0066659 -32.835 < 2e-16 ***
hum:rain 0.0267420 0.0146201 1.829 0.06740 .
wind:rain 0.0365615 0.0122335 2.989 0.00281 **
hum:wind:rain -0.0438790 0.0159479 -2.751 0.00594 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
' ' 1
Residual standard error: 0.8723 on 20337 degrees of freedom
Multiple R-squared: 0.2394, Adjusted R-squared: 0.239
F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16
So having, for instance for humidity (hum), B = 28.35 +/- 1.468, while
Beta = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is
there
an error in my code that has caused this contradiction?
Many thanks,
James.
----------------------
JC Matthews
School of Chemistry
Bristol University
Hi,
without having your data, so unable to check, I would not be
surprised if the changes of sign were the outcome of your model
formula, in particular the 3-variable (2nd-order) interaction,
i.e. you are using a model which is non-linear in the variables
themselves. Let's just take that part of the model:
lm(formula = mean1 ~ hum * wind * rain
This, in its quantitative expression, expands to:
mean1 = C0 + C11*hum + C12*wind + C13*rain
+ C21*hum*wind + C22*hum*rain + C23*wind*rain
+ C31*hum*wind*rain
Suppose that is for the unstandardised variables. Now express
it in terms of standardised variables (initial capital letters):
mean1 = C0 + C11*sd(hum)*(Hum + mean(hum)/sd(hum))
+ C12*sd(wind)*(Wind + mean(wind)/sd(wind))
+ C13*sd(rain)*(Rain + mean(rain)/sd(rain))
+ C21*sd(hum)*sd(wind)*
(Hum + mean(hum)/sd(hum))*(Wind + mean(wind)/sd(wind))
+ C22*sd(hum)*sd(rain)*
(Hum + mean(hum)/sd(hum))*(Rain + mean(rain)/sd(rain))
+ C23*sd(wind)*sd(rain)*
(Wind + mean(wind)/sd(wind))*
(Rain + mean(rain)/sd(rain))
+ C31*sd(hum)*sd(wind)*sd(rain)*
(Hum + mean(hum)/sd(hum))*
(Wind + mean(wind)/sd(wind))*
(Rain + mean(rain)/sd(rain))
Now pick out, say, the coefficient of 'Hum' in this latter expression
(i.e. all the terms which involve 'Hum' but neither 'Wind' nor 'Rain'):
C11*sd(hum)
+ C21*sd(hum)*sd(wind)*mean(wind)/sd(wind)
+ C22*sd(hum)*sd(rain)*mean(rain)/sd(rain)
+ C31*sd(hum)*sd(wind)*sd(rain)*
(mean(wind)/sd(wind))*(mean(rain)/sd(rain))
= C11*sd(hum)
+ C21*sd(hum)*mean(wind)
+ C22*sd(hum)*mean(rain)
+ C31*sd(hum)*mean(wind)*mean(rain)
So there is no reason to expect this to have even the same sign
as the original C11, the coefficient of 'hum', let alone any more
specific relationship with it!
Hoping this helps,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Aug-11 Time: 17:30:29
------------------------------ XFMail ------------------------------
----------------------
JC Matthews
Atmospheric Chemistry Research Group
School of Chemistry
Bristol University
j.c.matth...@bristol.ac.uk
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.