[Rd] inconsistent handling of factor, character, and logical predictors in lm()

Fox, John Fri, 30 Aug 2019 11:11:58 -0700

Dear R-devel list members,

I've discovered an inconsistency in how lm() and similar functions handle 
logical predictors as opposed to factor or character predictors. An "lm" object 
for a model that includes factor or character predictors includes the levels of 
a factor or unique values of a character predictor in the $xlevels component of 
the object, but not the FALSE/TRUE values for a logical predictor even though 
the latter is treated as a factor in the fit.


For example:

------------ snip --------------

> m1 <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris)
> m1$xlevels
$Species
[1] "setosa"     "versicolor" "virginica" 
 
> m2 <- lm(Sepal.Length ~ Sepal.Width + as.character(Species), data=iris)
> m2$xlevels
$`as.character(Species)`
[1] "setosa"     "versicolor" "virginica" 

> m3 <- lm(Sepal.Length ~ Sepal.Width + I(Species == "setosa"), data=iris)
> m3$xlevels
named list()

> m3

Call:
lm(formula = Sepal.Length ~ Sepal.Width + I(Species == "setosa"), 
    data = iris)

Coefficients:
               (Intercept)                 Sepal.Width  I(Species == 
"setosa")TRUE  
                    3.5571                      0.9418                     
-1.7797  

------------ snip --------------

I believe that the culprit is .getXlevels(), which makes provision for factor 
and character predictors but not for logical predictors:

------------ snip --------------

> .getXlevels
function (Terms, m) 
{
    xvars <- vapply(attr(Terms, "variables"), deparse2, 
        "")[-1L]
    if ((yvar <- attr(Terms, "response")) > 0) 
        xvars <- xvars[-yvar]
    if (length(xvars)) {
        xlev <- lapply(m[xvars], function(x) if (is.factor(x)) 
            levels(x)
        else if (is.character(x)) 
            levels(as.factor(x)))
        xlev[!vapply(xlev, is.null, NA)]
    }
}

------------ snip --------------

It would be simple to modify the last test in .getXlevels to 

        else if (is.character(x) || is.logical(x))

which would cause .getXlevels() to return c("FALSE", "TRUE") (assuming both 
values are present in the data). I'd find that sufficient, but alternatively 
there could be a separate test for logical predictors that returns c(FALSE, 
TRUE).

I discovered this issue when a function in the effects package failed for a 
model with a logical predictor. Although it's possible to program around the 
problem, I think that it would be better to handle factors, character 
predictors, and logical predictors consistently.

Best,
 John

--------------------------------------
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: socialsciences.mcmaster.ca/jfox/

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] inconsistent handling of factor, character, and logical predictors in lm()

Reply via email to