Re: [R] confused on model.frame evaluation

Marc Schwartz Fri, 30 Apr 2010 15:44:27 -0700

On Apr 30, 2010, at 4:57 PM, Erik Iverson wrote:

> <snip>
>> I'm sure it's not a bug, but could someone point to a thread or offer some 
>> gentle advice on what's happening?  I think it's related to:
>> test <- data.frame(name1 = 1:5, name2 = 6:10, test = 11:15)
>> eval(expression(test[c("name1", "name2")]))
>> eval(expression(interco[c("name1", "test")]))
> 
> scratch that last one, obviously a typo was causing my confusion there!  The 
> model.frame stuff remains a mystery to me though...



Hi Erik,

It's late on a Friday, it's grey and raining here in Minneapolis and I am short 
on caffeine, but, that being said, consider the following :-)


> working
  france manual famanual total working  no
1      1      1        1   107      85  22
2      1      1        0    65      44  21
3      1      0        1    66      24  42
4      1      0        0   171      17 154
5      0      1        1    87      24  63
6      0      1        0    65      22  43
7      0      0        1    85       1  84
8      0      0        0   148       6 142


> as.matrix(working[c("working", "no")])
     working  no
[1,]      85  22
[2,]      44  21
[3,]      24  42
[4,]      17 154
[5,]      24  63
[6,]      22  43
[7,]       1  84
[8,]       6 142


> with(working, as.matrix(working[c("working", "no")]))
     [,1]
[1,]   NA
[2,]   NA


For the incantations of model.frame(), the formula terms are evaluated first 
within the scope of the data frame indicated for the 'data' argument.

Thus, in the second case, I am asking for the as.matrix(...) call to be 
evaluated within the scope of the 'working' data frame, which returns a matrix 
with only two rows, one NA for each column that was asked for and not found, 
which is different than the number of rows in 'working', thus you get the error 
as soon as the 'france' column is evaluated in the formula to create the model 
frame:

Error in model.frame.default(formula = as.matrix(working[c("working",  :
 variable lengths differ (found for 'france')


2 rows in the response matrix versus 8 rows for 'france'...


It is kind of like you are asking for:

> as.matrix(working$working[c("working", "no")])
     [,1]
[1,]   NA
[2,]   NA



Now, try this:

> with(working, matrix(c(working, no), ncol = 2))
     [,1] [,2]
[1,]   85   22
[2,]   44   21
[3,]   24   42
[4,]   17  154
[5,]   24   63
[6,]   22   43
[7,]    1   84
[8,]    6  142


and then:

> summary(glm(matrix(c(working, no), ncol = 2) ~ france + manual + famanual, 
> data = working, family = binomial))

Call:
glm(formula = matrix(c(working, no), ncol = 2) ~ france + manual + 
    famanual, family = binomial, data = working)

Deviance Residuals: 
       1         2         3         4         5         6         7  
 0.09316  -0.14108   2.38028  -1.91838  -1.48196   1.84993  -1.61864  
       8  
 1.16747  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -3.6902     0.2547 -14.489  < 2e-16 ***
france        1.9474     0.2162   9.008  < 2e-16 ***
manual        2.5199     0.2168  11.625  < 2e-16 ***
famanual      0.5522     0.2017   2.738  0.00618 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 308.329  on 7  degrees of freedom
Residual deviance:  18.976  on 4  degrees of freedom
AIC: 60.162

Number of Fisher Scoring iterations: 4



Does that help top clarify?

Regards,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confused on model.frame evaluation

Reply via email to