On Sat, 5 Dec 2009, Juliet Hannah wrote:
Your R code looks correct.
There are a couple of hiccups.
First the degrees of freedom for the partial correlation would be wrong
even if there was no missing data.
Because this is a straightforward calculation, I would be surprised if there
were any differences with SPSS.
There are differences. SPSS seems to use the correlation matrix computed
with a pairwise present method and compute partial correlations from that.
Following
http://wiki.r-project.org/rwiki/doku.php?id=tips:data-matrices:part_corr
R.pp <- cor(cbind(x,y,z1,z2),use='pair')
R.comp <- cor(cbind(x,y,z1,z2),use='complete')
Rinv <- solve(R.pp)
D <- diag(1 / sqrt(diag(Rinv)))
P <- -D %*% Rinv %*% D
P[1,2]
[1] 0.4596122
Rinv <- solve(R.comp)
D <- diag(1 / sqrt(diag(Rinv)))
P <- -D %*% Rinv %*% D
P[1,2]
[1] 0.657214
The pairwise present value seems to be what SPSS is reporting.
The complete cases values is nearly (but not the same as) what you got.
A real issue here is how to usefully compute and test partial
correlations in the presence of missing data. If you want to persue that,
I would suggest opening a new thread with a subject line like 'partial
correlations with missing observations'
HTH,
Chuck
It may be worthwhile to check
if SPSS gives partial correlations or semipartial correlations. For example,
if you take the correlation between
py <- resid(lm(y ~ z1 + z2,data=mydat2))
and
x
where mydat2 has missing values removed, you get 0.47.
On Tue, Dec 1, 2009 at 8:24 PM, dadrivr <dadr...@gmail.com> wrote:
I am trying to calculate a partial correlation and p-values. Unfortunately,
the results in R are different than what SPSS gives.
Here is an example in R (calculating the partial correlation of x and y,
controlling for z1 and z2):
x <- c(1,20,14,30,9,4,8)
y <- c(5,6,7,9,NA,10,6)
z1 <- c(13,8,16,14,26,13,20)
z2 <- c(12,NA,2,5,8,16,13)
fmx <- lm(x ~ z1 + z2, na.action = na.exclude)
fmy <- lm(y ~ z1 + z2, na.action = na.exclude)
yres <- resid(fmy)
xres <- resid(fmx)
cor(xres, yres, use = "p")
ct <- cor.test(xres, yres)
ct$estimate
ct$p.value
R give me:
r = .65, p = .23
However, SPSS calculates:
r = .46, p = .70
I think something may be different with R's handling of missing data, as
when I replace the NA's with values, R and SPSS give the same r-values,
albeit different p-values still. I am doing pairwise case exclusion in both
R and SPSS. Any ideas why I'm getting different values? Is something wrong
with my formula in R? Any help would be greatly appreciated. Thanks!
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.