Re: [R] how to suppress the intercept in an lm()-like formula method?

Michael Friendly Tue, 29 Jan 2013 07:26:38 -0800

On 1/29/2013 9:25 AM, Duncan Murdoch wrote:

On 29/01/2013 9:14 AM, Michael Friendly wrote:
To partly answer my own question:  It wasn't that hard to hack the
result of model.matrix() to remove the intercept,
remove.intercept <- function(x) {
    if (colnames(x)[1] == "(Intercept)") {
        x <- x[,-1]
        attr(x, "assign") <- attr(x, "assign")[-1]
    }
    x
}
I think you need to do some of the low level calculations yourself,specifically the "terms" calculation.
For example, this forces no intercept, regardless of what the userspecifies. You might prefer just to change the default to nointercept and allow the user to use "+1" to add one, which looksharder...
# .... set formula to the user's formula ...
# Now modify it to suppress the intercept:

class(formula) <- c("nointercept", class(formula))
terms.nointercept <- function(x, ...) {
  result <- NextMethod(x, ...)
  attr(result, "intercept") <- 0
  result
}

That's very clever. I think I can use this locally in my function.However, it is still a mystery to me *where* in the process terms()gets called. The main incantation for model formulae in lm() is below,so who calls terms()?


    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)

m <- match(c("formula", "data", "subset", "weights", "na.action"),names(mf), 0L)

    mf <- mf[c(1L, m)]

    mf[[1L]] <- as.name("model.frame")
    mf <- eval(mf, parent.frame())

    mt <- attr(mf, "terms")

It is also a mystery to me where na.action takes its effect. Presumablythis is somewhere before lm.fit(x, y, ...)gets called, but I'd like to take account of this in my cancor.defaultmethod.


-Michael

Now lm(formula) does a fit with no intercept.

Duncan Murdoch


However, the model frame and therefore the model terms stored in the
object are wrong, still including the intercept:

Browse[1]> str(mt)
Classes 'terms', 'formula' length 3 cbind(SAT, PPVT, Raven) ~ n + s + ns
+ na + ss
    ..- attr(*, "variables")= language list(cbind(SAT, PPVT, Raven), n,
s, ns, na, ss)
    ..- attr(*, "factors")= int [1:6, 1:5] 0 1 0 0 0 0 0 0 1 0 ...
    .. ..- attr(*, "dimnames")=List of 2
    .. .. ..$ : chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s" "ns" ...
    .. .. ..$ : chr [1:5] "n" "s" "ns" "na" ...
    ..- attr(*, "term.labels")= chr [1:5] "n" "s" "ns" "na" ...
    ..- attr(*, "order")= int [1:5] 1 1 1 1 1
    ..- attr(*, "intercept")= int 1
    ..- attr(*, "response")= int 1
    ..- attr(*, ".Environment")=<environment: R_GlobalEnv>

..- attr(*, "predvars")= language list(cbind(SAT, PPVT, Raven),n, s,

ns, na, ss)
    ..- attr(*, "dataClasses")= Named chr [1:6] "nmatrix.3" "numeric"
"numeric" "numeric" ...
    .. ..- attr(*, "names")= chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s"
"ns" ...
Browse[1]>


On 1/29/2013 8:44 AM, Michael Friendly wrote:

> I'm trying to write a formula method for canonical correlationanalysis,

> that could be called similarly to lm() for
> a multivariate response:
>
> cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...)
> or perhaps more naturally,
> cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...)
>
> I've adapted the code from lm() to my case, but in this situation, it
> doesn't make sense to
> include an intercept, since X & Y are mean centered by default in the
> computation.
>

> In the code below, I can't see where the intercept gets included inthe

> model matrix and therefore
> how to suppress it.  There is a test case at the end, showing that the
> method fails when called

> normally, but works if I explicitly use -1 in the formula. I couldhack

> the result of model.matrix(),
> but maybe there's an easier way?
>
> cancor <- function(x, ...) {
>      UseMethod("cancor", x)
> }
>
> cancor.default <- candisc:::cancor
>
> # TODO: make cancisc::cancor() use x, y, not X, Y
> cancor.formula <- function(formula, data, subset, weights,
>          na.action,
>          method = "qr",
>      model = TRUE,
>      x = FALSE, y = FALSE, qr = TRUE,
>      contrasts = NULL, ...) {
>
>      cl <- match.call()
>      mf <- match.call(expand.dots = FALSE)

> m <- match(c("formula", "data", "subset", "weights","na.action"),

> names(mf), 0L)
>      mf <- mf[c(1L, m)]
>
>      mf[[1L]] <- as.name("model.frame")
>      mf <- eval(mf, parent.frame())
>
>      mt <- attr(mf, "terms")
>      y <- model.response(mf, "numeric")
>      w <- as.vector(model.weights(mf))
>      if (!is.null(w) && !is.numeric(w))
>          stop("'weights' must be a numeric vector")
>
>      x <- model.matrix(mt, mf, contrasts)
>      # fixup to remove intercept???
>      z <- if (is.null(w))
>          cancor.default(x, y,  ...)

> else stop("weights are not yet implemented") # lm.wfit(x, y,w, ...)

>
>      z$call <- cl
>      z$terms <- mt
>          z
> }
>
> TESTME <- FALSE
> if (TESTME) {
>
> # need to get latest version, 0.6-1 from R-Forge
> install.packages("candisc", repo="http://R-Forge.R-project.org";)
> library(candisc)
>
> data(Rohwer)
>
> # this bombs: needs intercept removed
> cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~  n + s + ns + na + ss,
> data=Rohwer)
> ## Error in chol.default(Rxx) :
> ##  the leading minor of order 1 is not positive definite
>
> #this works as is
> cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~  -1 + n + s + ns + na +
> ss, data=Rohwer)
> cc
> ## Canonical correlation analysis of:
> ##       5   X  variables:  n, s, ns, na, ss
> ##   with        3   Y  variables:  SAT, PPVT, Raven
> ##
> ##     CanR  CanRSQ   Eigen percent    cum
> ## 1 0.6703 0.44934 0.81599   77.30  77.30
> ## 2 0.3837 0.14719 0.17260   16.35  93.65
> ## 3 0.2506 0.06282 0.06704    6.35 100.00
> ##
> ## Test of H0: The canonical correlations in the
> ## current row and all that follow are zero
> ##
>    ...
> }
>
>



--
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to suppress the intercept in an lm()-like formula method?

Reply via email to