Thanks! > > On Aug 20, 2009, at 1:46 PM, g...@ucalgary.ca wrote: > >> I got two questions on factors in regression: >> >> Q1. >> In a table, there a few categorical/factor variables, a few numerical >> variables and the response variable is numeric. Some factors are >> important >> but others not. >> How to determine which categorical variables are significant to the >> response variable? > > Seems that you should engage the services of a consulting statistician > for that sort of question. Or post in a venue where statistical > consulting is supposed to occur, such as one of the sci.stat.* > newsgroups.
I googled sci.stat.* and got sci.stat.math and sci.stat.consult. Are they good? I have no idea to do this. So any clue will be appreciated. > >> >> Q2. >> As we knew, lm can deal with categorical variables. >> I thought, when there is a categorical predictor, we may use lm >> directly >> without quantifying these factors and assigning different values to >> factors >> would not change the fittings as shown: > > The "numbers" that you are attempting to assign are really just labels > for the factor levels. The regression functions in R will not use them > for any calculations. They should not be thought of as having > "values". Even if the factor is an ordered factor, the labels may not > be interpretable as having the same numerical order as the string > values might suggest. > >> >> x <- 1:20 ## numeric predictor >> yes.no <- c("yes","no") >> factors <- gl(2,10,20,yes.no) ##factor predictor >> factors.quant <- rep(c(18.8,29.9),c(10,10)) ##quantificatio of >> factors > > Not sure what that is supposed to mean. It is not a factor object even > though you may be misleading yourself in to believing it should be. > It's a numeric vector. Yes, levels are not numeric but just labels. But after the levels factors being assigned to numeric values as factors.quant and factors.quant.1, lm(response ~ x + factors.quant) and lm(response ~ x + factors.quant1) produced the same fitted curve as lm(response ~ x + factors). This is what I could not understand. > > str(factors.quant) > num [1:20] 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 ... > >> factors.quant.1 <- rep(c(16.9,38.9),c(10,10)) >> ##second quantificatio of factors >> response <- 0.8*x + 18 + factors.quant + rnorm(20) ##response >> lm.quant <- lm(response ~ x + factors.quant) ##lm with quantifications >> lm.fact <- lm(response ~ x + factors) ##lm with factors > > > lm.quant > > Call: > lm(formula = response ~ x + factors.quant) > > Coefficients: > (Intercept) x factors.quant > 14.9098 0.5385 1.2350 > > > lm.fact > > Call: > lm(formula = response ~ x + factors) > > Coefficients: > (Intercept) x factorsno > 38.1286 0.5385 13.7090 >> >> lm.quant.1 <- lm(response ~ x + factors.quant.1) ##lm with >> quantifications > > > lm.quant.1 > > Call: > lm(formula = response ~ x + factors.quant.1) > > Coefficients: > (Intercept) x factors.quant.1 > 27.5976 0.5385 0.6231 > >> lm.fact.1 <- lm(response ~ x + factors) ##lm with factors >> >> par(mfrow=c(2,2)) ## comparisons of two fittings >> plot(x, response) >> lines(x,fitted(lm.quant),col="blue") >> grid() >> plot(x,response) >> lines(x,fitted(lm.fact),col = "red") >> grid() >> plot(x, response) >> lines(x,fitted(lm.quant.1),lty =2,col="blue") >> grid() >> plot(x,response) >> lines(x,fitted(lm.fact.1),lty =2,col = "red") >> grid() >> par(mfrow = c(1,1)) >> >> So, is it right that we can assign any numeric values to factors, >> for example, c(yes, no) = c(18.8,29.9) or (16.9,38.9) in the above, >> before doing lm, glm, aov, even nls? > > You can give factor levels any name you like, including any sequence > of digit characters. Unlike "ordinary R where unquoted numbers cannot > start variable names, factor functions will coerce numeric vectors to > character vectors when assigning level names. But you seem to be > conflating factors with numeric vectors that have many ties. Those two > entities would have different handling by R's regression functions. > > -- > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.