Re: [R] Questions on factors in regression analysis

David Winsemius Thu, 20 Aug 2009 11:17:48 -0700


On Aug 20, 2009, at 1:46 PM, g...@ucalgary.ca wrote:

I got two questions on factors in regression:

Q1.
In a table, there a few categorical/factor variables, a few numerical

variables and the response variable is numeric. Some factors areimportant

but others not.
How to determine which categorical variables are significant to the
response variable?

Seems that you should engage the services of a consulting statisticianfor that sort of question. Or post in a venue where statisticalconsulting is supposed to occur, such as one of the sci.stat.*newsgroups.

Q2.
As we knew, lm can deal with categorical variables.
I thought, when there is a categorical predictor, we may use lmdirectlywithout quantifying these factors and assigning different values tofactors
would not change the fittings as shown:

The "numbers" that you are attempting to assign are really just labelsfor the factor levels. The regression functions in R will not use themfor any calculations. They should not be thought of as having"values". Even if the factor is an ordered factor, the labels may notbe interpretable as having the same numerical order as the stringvalues might suggest.

x <- 1:20 ## numeric predictor
yes.no <- c("yes","no")
factors <- gl(2,10,20,yes.no) ##factor predictor
factors.quant <- rep(c(18.8,29.9),c(10,10)) ##quantificatio offactors

Not sure what that is supposed to mean. It is not a factor object eventhough you may be misleading yourself in to believing it should be.It's a numeric vector.

> str(factors.quant)
 num [1:20] 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 ...

factors.quant.1 <-  rep(c(16.9,38.9),c(10,10))
  ##second quantificatio of factors
response <- 0.8*x + 18 + factors.quant + rnorm(20) ##response
lm.quant <- lm(response ~ x + factors.quant) ##lm with quantifications
lm.fact <- lm(response ~ x + factors) ##lm with factors


> lm.quant

Call:
lm(formula = response ~ x + factors.quant)

Coefficients:
  (Intercept)              x  factors.quant
      14.9098         0.5385         1.2350

> lm.fact

Call:
lm(formula = response ~ x + factors)

Coefficients:
(Intercept)            x    factorsno
    38.1286       0.5385      13.7090

lm.quant.1 <- lm(response ~ x + factors.quant.1) ##lm withquantifications


> lm.quant.1

Call:
lm(formula = response ~ x + factors.quant.1)

Coefficients:
    (Intercept)                x  factors.quant.1
        27.5976           0.5385           0.6231

lm.fact.1 <- lm(response ~ x + factors) ##lm with factors

par(mfrow=c(2,2)) ## comparisons of two fittings
plot(x, response)
lines(x,fitted(lm.quant),col="blue")
grid()
plot(x,response)
lines(x,fitted(lm.fact),col = "red")
grid()
plot(x, response)
lines(x,fitted(lm.quant.1),lty =2,col="blue")
grid()
plot(x,response)
lines(x,fitted(lm.fact.1),lty =2,col = "red")
grid()
par(mfrow = c(1,1))

So, is it right that we can assign any numeric values to factors,
for example, c(yes, no) = c(18.8,29.9) or (16.9,38.9) in the above,
before doing lm, glm, aov, even nls?

You can give factor levels any name you like, including any sequenceof digit characters. Unlike "ordinary R where unquoted numbers cannotstart variable names, factor functions will coerce numeric vectors tocharacter vectors when assigning level names. But you seem to beconflating factors with numeric vectors that have many ties. Those twoentities would have different handling by R's regression functions.


--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Questions on factors in regression analysis

Reply via email to