I'm trying to write a formula method for canonical correlation analysis, that could be called similarly to lm() for
a multivariate response:

cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...)
or perhaps more naturally,
cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...)

I've adapted the code from lm() to my case, but in this situation, it doesn't make sense to include an intercept, since X & Y are mean centered by default in the computation.

In the code below, I can't see where the intercept gets included in the model matrix and therefore how to suppress it. There is a test case at the end, showing that the method fails when called normally, but works if I explicitly use -1 in the formula. I could hack the result of model.matrix(),
but maybe there's an easier way?

cancor <- function(x, ...) {
    UseMethod("cancor", x)
}

cancor.default <- candisc:::cancor

# TODO: make cancisc::cancor() use x, y, not X, Y
cancor.formula <- function(formula, data, subset, weights,
        na.action,
        method = "qr",
    model = TRUE,
    x = FALSE, y = FALSE, qr = TRUE,
    contrasts = NULL, ...) {

    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset", "weights", "na.action"), names(mf), 0L)
    mf <- mf[c(1L, m)]

    mf[[1L]] <- as.name("model.frame")
    mf <- eval(mf, parent.frame())

    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (!is.null(w) && !is.numeric(w))
        stop("'weights' must be a numeric vector")

    x <- model.matrix(mt, mf, contrasts)
    # fixup to remove intercept???
    z <- if (is.null(w))
        cancor.default(x, y,  ...)
    else stop("weights are not yet implemented")  # lm.wfit(x, y, w,  ...)

    z$call <- cl
    z$terms <- mt
        z
}

TESTME <- FALSE
if (TESTME) {

# need to get latest version, 0.6-1 from R-Forge
install.packages("candisc", repo="http://R-Forge.R-project.org";)
library(candisc)

data(Rohwer)

# this bombs: needs intercept removed
cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer)
## Error in chol.default(Rxx) :
##  the leading minor of order 1 is not positive definite

#this works as is
cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ -1 + n + s + ns + na + ss, data=Rohwer)
cc
## Canonical correlation analysis of:
##       5   X  variables:  n, s, ns, na, ss
##   with        3   Y  variables:  SAT, PPVT, Raven
##
##     CanR  CanRSQ   Eigen percent    cum
## 1 0.6703 0.44934 0.81599   77.30  77.30
## 2 0.3837 0.14719 0.17260   16.35  93.65
## 3 0.2506 0.06282 0.06704    6.35 100.00
##
## Test of H0: The canonical correlations in the
## current row and all that follow are zero
##
  ...
}


--
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to