Hi JC, You have interactions in your model, which means that your models specifies that the coefficients for hum, wind, and rain should vary depending on the value of the other two (and depending on their own value actually, since you also have quadratic effects for each of these variables in your model). Since these coefficients are varying according to the model, it is impossible to specify their value unconditionally. The values you are seeing are therefore conditional estimates that at particular values on the variables with which each predictor interacts. Since you've changed the distribution of those variables by standardizing them, you get different conditional estimates.
All this will be covered in most regression textbooks. Best, Ista On Mon, Aug 22, 2011 at 11:37 AM, JC Matthews <j.c.matth...@bristol.ac.uk> wrote: > > Hello, > > I have a statistical problem that I am using R for, but I am not making > sense of the results. I am trying to use multiple regression to explore > which variables (weather conditions) have the greater effect on a local > atmospheric variable. The data is taken from a database that has 20391 data > points (Z1). > > A simplified version of the data I'm looking at is given below, but I have a > problem in that there is a disagreement in sign between the regression > coefficients and the standardised regression coefficients. Intuitively I > would expect both to be the same sign, but in many of the parameters, they > are not. > > I am aware that there is a strong opinion that using standardised > correlation coefficients is highly discouraged by some people, but I would > nevertheless like to see the results. Not least because it has made me doubt > the non-standardised values of B that R has given me. > > The code I have used, and some of the data, is as follows (once the database > has been imported from SQL, and outliers removed). > > > > Z1sub <- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)] > colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad", > "mean1", "sd1" ) > > attach(Z1sub) > names(Z1sub) > > > Model1d <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + I(rain^2) ) > > summary(Model1d) > > Call: > lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + > I(rain^2)) > > Residuals: > Min 1Q Median 3Q Max > -1230.64 -63.17 18.51 97.85 1275.73 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -9.243e+02 5.689e+01 -16.246 < 2e-16 *** > hum 2.835e+01 1.468e+00 19.312 < 2e-16 *** > wind 1.236e+02 4.832e+00 25.587 < 2e-16 *** > rain -3.144e+03 7.635e+02 -4.118 3.84e-05 *** > I(hum^2) -1.953e-01 9.393e-03 -20.793 < 2e-16 *** > I(wind^2) 6.914e-01 2.174e-01 3.181 0.00147 ** > I(rain^2) 2.730e+02 3.265e+01 8.362 < 2e-16 *** > hum:wind -1.782e+00 5.448e-02 -32.706 < 2e-16 *** > hum:rain 2.798e+01 8.410e+00 3.327 0.00088 *** > wind:rain 6.018e+02 2.146e+02 2.805 0.00504 ** > hum:wind:rain -6.606e+00 2.401e+00 -2.751 0.00594 ** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > Residual standard error: 180.5 on 20337 degrees of freedom > Multiple R-squared: 0.2394, Adjusted R-squared: 0.239 > F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16 > > > > > > To calculate the standardised coefficients, I used the following: > > Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind', 'press', > 'rain', 's.rad', 'mean1', 'sd1' ) ] ) ) > > attach(Z1sub.scaled) > names(Z1sub.scaled) > > > Model1d.sc <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + I(rain^2) ) > > summary(Model1d.scaled) > > Call: > lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + > I(rain^2)) > > Residuals: > Min 1Q Median 3Q Max > -5.94713 -0.30527 0.08946 0.47287 6.16503 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.0806858 0.0096614 8.351 < 2e-16 *** > hum -0.4581509 0.0073456 -62.371 < 2e-16 *** > wind -0.1995316 0.0073767 -27.049 < 2e-16 *** > rain -0.1806894 0.0158037 -11.433 < 2e-16 *** > I(hum^2) -0.1120435 0.0053885 -20.793 < 2e-16 *** > I(wind^2) 0.0172870 0.0054346 3.181 0.00147 ** > I(rain^2) 0.0040575 0.0004853 8.362 < 2e-16 *** > hum:wind -0.2188729 0.0066659 -32.835 < 2e-16 *** > hum:rain 0.0267420 0.0146201 1.829 0.06740 . > wind:rain 0.0365615 0.0122335 2.989 0.00281 ** > hum:wind:rain -0.0438790 0.0159479 -2.751 0.00594 ** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > Residual standard error: 0.8723 on 20337 degrees of freedom > Multiple R-squared: 0.2394, Adjusted R-squared: 0.239 > F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16 > > > > So having, for instance for humidity (hum), B = 28.35 +/- 1.468, while Beta > = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is there an > error in my code that has caused this contradiction? > > Many thanks, > > James. > > > ---------------------- > JC Matthews > School of Chemistry > Bristol University > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.