Re: [R] lm function in R

Something Something Sat, 13 Feb 2010 17:05:32 -0800

>>From your question it is difficult to determine what sort of tutoring you
are expecting.
The kind of tutoring you are providing is definitely helpful :)


I guess I should quickly explain what I am trying to accomplish.  I have
been a computer scientist for quite a few years, but I studied Statistics
long time ago.  Now trying to get back into the Statistics field.

For my latest project I have been asked to implement Multiple Regression
Analysis (using interactions - I guess) in Java & Distributed Computing
(more specifically Hadoop).  I looked at the possibility of using rJava, but
it's 'single threaded' model plus complex deployment on 1000s of machines
does not make it very appealing.  I looked at using 3rd party libraries such
as 'Apache Commons Math' & Flanagan's JAVA statistics library.  Both of them
use this formula:
Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + · · · + bkXk

and NOT the one I have been asked to implement - which is -
Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +........  + c

(Not to mention these tools do not use BigDecimal so the answers given by
them are not as precise as those from R).

So now the deadline is approaching... and I can't find any Java library that
matches results from R.. so I have to roll up my sleeves and start coding in
Java - which means I need to get up to speed on Y-Hat calculations very
quickly.  Ideally, looking for a paper (or a text book) that gives step by
step instructions on calculating coefficients such as (b0, b1... b13).  I
have been searching but haven't found anything.  My last resort is to look
at R source code and start converting it to Java.

I am new to R and a little rusty on Statistics, so I apologize for all the
dumb questions, and GREATLY appreciate your patience and help.  Thanks.


On Sat, Feb 13, 2010 at 3:20 PM, David Winsemius <dwinsem...@comcast.net>wrote:

>
> On Feb 13, 2010, at 5:03 PM, Something Something wrote:
>
>  I tried..
>>
>> mod = lm(Y ~ X1*X2*X3, na.action = na.exclude)
>> formula(mod)
>>
>> This produced....
>> Y ~ X1 * X2 * X3
>>
>>
>> When I typed just mod I got:
>>
>> Call:
>> lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude)
>>
>> Coefficients:
>> (Intercept)          X11          X21          X31      X11:X21
>>  X11:X31
>>    X21:X31  X11:X21:X31
>>  177.9245       0.2005       2.4482       3.1216       0.8127     -26.6166
>>    -3.0398      29.6049
>>
>>
>> I am trying to figure out how R computed all these coefficients.
>>
>
> From your question it is difficult to determine what sort of tutoring you
> are expecting. To get the code of an R formula, you just type its name:
>
> lm
>
> Leads to lm.fit:
>
> lm.fit
>
> Reading further it appears the lm and lm.fit functions are really front
> ends for this call:
>
> .Fortran("dqrls", qr = x, n = n, p = p, y = y, ny = ny,
>        tol = as.double(tol), coefficients = mat.or.vec(p, ny),
>        residuals = y, effects = y, rank = integer(1L), pivot = 1L:p,
>        qraux = double(p), work = double(2 * p), PACKAGE = "base")
>
> Seems pretty likely that is a QR decomposition-based method that i
> implemented in compiled code.
>
> So if you want to go deeper, at least you know what to search for. Or if
> you want to know how regression works on a matrix level, you should consult
> a good reference text or Wikipedia, which is surprisingly good for that sort
> of question these days.
>
> --
> David.
>
>>
>>
>>
>>
>>
>> On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter <gunter.ber...@gene.com>
>> wrote:
>>
>>  ?formula
>>>
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Statistics
>>>
>>> -----Original Message-----
>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>>> On
>>> Behalf Of Something Something
>>> Sent: Saturday, February 13, 2010 1:24 PM
>>> To: Daniel Nordlund
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] lm function in R
>>>
>>> Thanks Dan.  Yes that was very helpful.  I didn't see the change from '*'
>>> to
>>> '+'.
>>>
>>> Seems like when I put * it means - interaction & when I put + it's not an
>>> interaction.
>>>
>>> Is it correct to assume then that...
>>>
>>> When I put + R evaluates the following equation:
>>> Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk
>>>
>>>
>>> But when I put * R evaluates the following equation;
>>> Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +........  + c
>>>
>>> Is this correct?  If it is then can someone point me to any sources that
>>> will explain how the coefficients (such as b0... bk, b12.. , b123..) are
>>> calculated.  I guess, one source is the R source code :) but is there any
>>> other documentation anywhere?
>>>
>>> Please let me know.  Thanks.
>>>
>>>
>>>
>>> On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund
>>> <djnordl...@verizon.net>wrote:
>>>
>>>  -----Original Message-----
>>>>> From: r-help-boun...@r-project.org [mailto:
>>>>>
>>>> r-help-boun...@r-project.org]
>>>
>>>> On Behalf Of Something Something
>>>>> Sent: Friday, February 12, 2010 5:28 PM
>>>>> To: Phil Spector; r-help@r-project.org
>>>>> Subject: Re: [R] lm function in R
>>>>>
>>>>> Thanks for the replies everyone.  Greatly appreciate it.  Some
>>>>>
>>>> progress,
>>>
>>>> but
>>>>> now I am getting the following values when I don't use "as.factor"
>>>>>
>>>>> 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
>>>>>
>>>>> Is that what you guys get?
>>>>>
>>>>
>>>>
>>>> If you look at Phil's response below, no, that is not what he got.  The
>>>> difference is that you are specifying an interaction, whereas Phil did
>>>>
>>> not
>>>
>>>> (because the equation you initially specified did not include an
>>>> interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.
>>>>
>>>>
>>>>>
>>>>> On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
>>>>> <spec...@stat.berkeley.edu>wrote:
>>>>>
>>>>>  By converting the two variables to factors, you are fitting
>>>>>> an entirely different model.  Leave out the as.factor stuff
>>>>>> and it will work exactly as you want it to.
>>>>>>
>>>>>> dat
>>>>>>
>>>>>>>
>>>>>>>  V1 V2 V3 V4
>>>>>> 1 s1 14  4  1
>>>>>> 2 s2 23  4  2
>>>>>> 3 s3 30  7  2
>>>>>> 4 s4 50  7  4
>>>>>> 5 s5 39 10  3
>>>>>> 6 s6 67 10  6
>>>>>>
>>>>>>  names(dat) = c('id','y','x1','x2')
>>>>>>> z = lm(y~x1+x2,dat)
>>>>>>> predict(z)
>>>>>>>
>>>>>>>      1        2        3        4        5        6 15.16667
>>>>>>
>>>>> 24.66667
>>>
>>>> 27.66667 46.66667 40.16667 68.66667
>>>>>>
>>>>>>
>>>>>>                                      - Phil Spector
>>>>>>                                       Statistical Computing
>>>>>>
>>>>> Facility
>>>
>>>>                                       Department of Statistics
>>>>>>                                       UC Berkeley
>>>>>>                                       spec...@stat.berkeley.edu
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, 12 Feb 2010, Something Something wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>>>
>>>>>>> I am trying to learn how to perform Multiple Regression Analysis in
>>>>>>>
>>>>>> R.
>>>
>>>> I
>>>>>
>>>>>> decided to take a simple example given in this PDF:
>>>>>>> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
>>>>>>>
>>>>>>> I created a small CSV called, students.csv that contains the
>>>>>>>
>>>>>> following
>>>
>>>> data:
>>>>>>>
>>>>>>> s1 14 4 1
>>>>>>> s2 23 4 2
>>>>>>> s3 30 7 2
>>>>>>> s4 50 7 4
>>>>>>> s5 39 10 3
>>>>>>> s6 67 10 6
>>>>>>>
>>>>>>> Col headers:  Student id, Memory span(Y), age(X1), speech rate(X2)
>>>>>>>
>>>>>>> Now the expected results are:
>>>>>>>
>>>>>>> yHat[0]:15.166666666666668
>>>>>>> yHat[1]:24.666666666666668
>>>>>>> yHat[2]:27.666666666666664
>>>>>>> yHat[3]:46.666666666666664
>>>>>>> yHat[4]:40.166666666666664
>>>>>>> yHat[5]:68.66666666666667
>>>>>>>
>>>>>>> This is based on the following equation (given in the PDF):  Y =
>>>>>>>
>>>>>> 1.67
>>>
>>>> +
>>>>
>>>>> X1
>>>>>
>>>>>> +
>>>>>>> 9.50 X2
>>>>>>>
>>>>>>> I ran the following commands in R:
>>>>>>>
>>>>>>> data = read.table("students.csv", head=F, as.is=T, na.string=".",
>>>>>>> row.nam=NULL)
>>>>>>> X1 = as.factor(data[[3]])
>>>>>>> X2 = as.factor(data[[4]])
>>>>>>> Y = data[[2]]
>>>>>>> mod = lm(Y ~ X1*X2, na.action = na.exclude)
>>>>>>> Y.hat = fitted(mod)
>>>>>>> Y.hat
>>>>>>>
>>>>>>> This gives me the following output:
>>>>>>>
>>>>>>> Y.hat
>>>>>>>
>>>>>>>>
>>>>>>>>  1  2  3  4  5  6
>>>>>>> 14 23 30 50 39 67
>>>>>>>
>>>>>>> Obviously I am doing something wrong.  Please help.  Thanks.
>>>>>>>
>>>>>>>
>>>> Hope this is helpful,
>>>>
>>>> Dan
>>>>
>>>> Daniel Nordlund
>>>> Bothell, WA USA
>>>>
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>       [[alternative HTML version deleted]]
>>>
>>>
>>>
>>>
>>        [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm function in R

Reply via email to