On Wed, 16 Dec 2009, whitaker m. (mw1006) wrote:

Hi,
I am trying to create a set of dummy variables to use within a multiple linear 
regression and am unable to find the codes within the manuals.

For example i have:
Price     Weight     Clarity
                            IF      VVS1    VVS2
500        8             1         0          0
1000      5.2          0         0          1
864        3              0        1          0
340        2.6          0         0          1
90          0.5          1         0          0
450        2.3          0         1          0

Where price is dependent upon weight (single value in each observation) and 
clarity (split into three levels, IF, VVS1, VVS2).
I am having trouble telling the program that clarity is a set of 3 dummy 
variables and keep getting error messages, what is the correct way?

You should code the categorical variable "Clarity" as a "factor" so that R knows that this is a categorical variable and can deal with it appropriately in subsequent computations such as summary() or lm().

Thus, I would recommend to store your data as

dat <- data.frame(
  Price = c(500, 1000, 864, 340, 90, 450),
  Weight = c(8, 5.2, 3, 2.6, 0.5, 2.3),
  Clarity = c("IF", "VVS1", "VVS2")[c(1, 3, 2, 3, 1, 2)])

which yields, e.g.,

R> summary(dat)
     Price            Weight      Clarity
 Min.   :  90.0   Min.   :0.500   IF  :2
 1st Qu.: 367.5   1st Qu.:2.375   VVS1:2
 Median : 475.0   Median :2.800   VVS2:2
 Mean   : 540.7   Mean   :3.600
 3rd Qu.: 773.0   3rd Qu.:4.650
 Max.   :1000.0   Max.   :8.000

and then you can also do

R> lm(Price ~ Weight + Clarity, data = dat)

Call:
lm(formula = Price ~ Weight + Clarity, data = dat)

Coefficients:
(Intercept)       Weight  ClarityVVS1  ClarityVVS2
     -45.05        80.01       490.02       403.00

or if you wish to choose a different coding

R> lm(Price ~ 0 + Weight + Clarity, data = dat)

Call:
lm(formula = Price ~ 0 + Weight + Clarity, data = dat)

Coefficients:
     Weight    ClarityIF  ClarityVVS1  ClarityVVS2
      80.01       -45.05       444.97       357.95


Some further reading of introductory material on linear regression in R would be useful. Also look at ?lm, ?factor, ?model.matrix, ?contrasts etc.

hth,
Z

Any helps is greatly appreciated.
Matthew

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to