Dear John and Stas, Thanks so much for your help. John, I did the correlation on the complete dataset (no missing values). I tried what you suggested and you were right: hetcor with pd=FALSE gives me the same result as polychor. Anyway, thanks to the answers I got in the forum, I understand I should not use these two variables in the analysis i wanted to do (at least in their present state;maybe within an index...). And I should also read (heaps) more on polychoric correlation/correlation with categorical data...:-)
Thanks for all your help! Cheers, Dorothee. Stas Kolenikov wrote: > > The original Olsson's paper > (http://www.citeulike.org/user/ctacmo/article/553309) did mention that > the greatest biases and numeric problems were encountered when the two > variables had opposite skewness. Your example is even more extreme: > tetrachoric and polychoric correlations do not like zero counts. It > actually means that your data sit on a straight line, but that line > does not pass through the intersection of the thresholds. The nominal > estimate of the correlation should be 1, and what you see should be > insignificantly different from 1. No wonder you get LAPACK errors: at > some point, you had to invert matrix( c(1,1,1,1), 2, 2) or compute its > determinant in the ML computations. My own Stata implementation of > polychoric correlation choked on your data and stopped with an > error... which I should've handled more gracefully :)). The data with > 0.5 added produced the same correlation estimate but different > standard errors. > > John Fox offered all other feasible explanations, like handling of > missing data in the pairwise and full data set computations. But with > unstable computations you can end just anywhere on the range of > estimates; the standard errors should tell you that your estimate is > quite imprecise. > > On 1/12/09, Dorothee <ddurp...@gmail.com> wrote: >> >> Hello, >> >> I am running polychoric correlations on a dataset composed of 12 ordinal >> and >> binary variables (N =384), using the polycor package. >> One of the association (between 2 dichotomous variables) is very high >> using >> the 2-step estimate (0.933 when polychoric run only between the two >> variables; but 0.801 when polychoric run on the 12 variables). The same >> correlation run with ML estimate returns a singularity message. >> >> First, I would like to know why the estimations between only the two >> dichotomous variables and with all the variables at once (with the >> 2-step >> estimate) returns slightly different results. >> >> Secondly, when i checked back the distribution of these two dichotomous >> variables they appear about symmetrically opposed. Therefore, one should >> indeed expect a strong association between them, but a negative one, >> isn't >> it? Why does the polychoric correlation returns a positive coefficient? >> What >> does it mean for the rest of the coefficients, should i trust them? >> >> I have to say I'm new to R and not very strong in statistics, I hope I >> haven't posted a stupid question... >> > > -- > Stas Kolenikov, also found at http://stas.kolenikov.name > Small print: I use this email account for mailing lists only. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/polychoric-correlation%3A-issue-with-coefficient-sign-tp21425977p21464084.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.