Returning to my original post, I still believe that a basic work-horse like cor(data.frame) with the default method="pearson" should try to do something more useful in this case than barf with a misleading error message if the data frame contains character variables.
To paraphrase Einstein, ``Things [in R] should be made as simple as possible, but not any simpler'' The case that Andy Liaw cited is a good example of the 'not any simpler' part. -Michael Gabor Grothendieck wrote: > You are right but I was just trying to stick to the same example. > In reality it would be ok as long as its an ordered factor. One could > restrict it to those of class "ordered". > > > On Dec 3, 2007 1:58 PM, Liaw, Andy <[EMAIL PROTECTED]> wrote: >> I'd call that another infelicity. Species is supposed to be nominal, >> not ordinal, so rank correlation wouldn't make much sense. So what does >> cor(, method="kendall") do? It looks like it simply uses the underlying >> numeric code. (Change Species to numerics and you'll see the same >> answer.) However, reordering the levels changes the result: >> >> R> iris2 <- iris >> R> levels(iris2$Species) <- levels(iris2$Species)[c(2, 1, 3)] >> R> cor(iris2, method = "kendall") >> Sepal.Length Sepal.Width Petal.Length Petal.Width Species >> Sepal.Length 1.00000000 -0.07699679 0.7185159 0.6553086 0.1897778 >> Sepal.Width -0.07699679 1.00000000 -0.1859944 -0.1571257 0.1439793 >> Petal.Length 0.71851593 -0.18599442 1.0000000 0.8068907 0.2677154 >> Petal.Width 0.65530856 -0.15712566 0.8068907 1.0000000 0.2724843 >> Species 0.18977778 0.14397927 0.2677154 0.2724843 1.0000000 >> >> To me, this is dangerous! >> >> Andy >> >> >> From: Gabor Grothendieck >> >>> You can calculate the Kendall rank correlation with such a matrix >>> so you would not want to exclude factors in that case: >>> >>>> cor(iris, method = "kendall") >>> Sepal.Length Sepal.Width Petal.Length >>> Petal.Width Species >>> Sepal.Length 1.00000000 -0.07699679 0.7185159 >>> 0.6553086 0.6704444 >>> Sepal.Width -0.07699679 1.00000000 -0.1859944 >>> -0.1571257 -0.3376144 >>> Petal.Length 0.71851593 -0.18599442 1.0000000 >>> 0.8068907 0.8229112 >>> Petal.Width 0.65530856 -0.15712566 0.8068907 >>> 1.0000000 0.8396874 >>> Species 0.67044444 -0.33761438 0.8229112 >>> 0.8396874 1.0000000 >>> >>> >>> On Dec 3, 2007 9:27 AM, Michael Friendly <[EMAIL PROTECTED]> wrote: >>>> In using cor(data.frame), it is annoying that you have to explicitly >>>> filter out non-numeric columns, and when you don't, the >>> error message >>>> is misleading: >>>> >>>> > cor(iris) >>>> Error in cor(iris) : missing observations in cov/cor >>>> In addition: Warning message: >>>> In cor(iris) : NAs introduced by coercion >>>> >>>> It would be nicer if stats:::cor() did the equivalent >>> *itself* of the >>>> following for a data.frame: >>>> > cor(iris[,sapply(iris, is.numeric)]) >>>> Sepal.Length Sepal.Width Petal.Length Petal.Width >>>> Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411 >>>> Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259 >>>> Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654 >>>> Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000 >>>> > >>>> >>>> A change could be implemented here: >>>> if (is.data.frame(x)) >>>> x <- as.matrix(x) >>>> >>>> Second, the default, use="all" throws an error if there are any >>>> NAs. It would be nicer if the default was use="complete.cases", >>>> which would generate warnings instead. Most other statistical >>>> software is more tolerant of missing data. >>>> >>>> > library(corrgram) >>>> > data(auto) >>>> > cor(auto[,sapply(auto, is.numeric)]) >>>> Error in cor(auto[, sapply(auto, is.numeric)]) : >>>> missing observations in cov/cor >>>> > cor(auto[,sapply(auto, is.numeric)],use="complete") >>>> # works; output elided >>>> >>>> -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.