>>From your question it is difficult to determine what sort of tutoring you are expecting. The kind of tutoring you are providing is definitely helpful :)
I guess I should quickly explain what I am trying to accomplish. I have been a computer scientist for quite a few years, but I studied Statistics long time ago. Now trying to get back into the Statistics field. For my latest project I have been asked to implement Multiple Regression Analysis (using interactions - I guess) in Java & Distributed Computing (more specifically Hadoop). I looked at the possibility of using rJava, but it's 'single threaded' model plus complex deployment on 1000s of machines does not make it very appealing. I looked at using 3rd party libraries such as 'Apache Commons Math' & Flanagan's JAVA statistics library. Both of them use this formula: Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + · · · + bkXk and NOT the one I have been asked to implement - which is - Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +........ + c (Not to mention these tools do not use BigDecimal so the answers given by them are not as precise as those from R). So now the deadline is approaching... and I can't find any Java library that matches results from R.. so I have to roll up my sleeves and start coding in Java - which means I need to get up to speed on Y-Hat calculations very quickly. Ideally, looking for a paper (or a text book) that gives step by step instructions on calculating coefficients such as (b0, b1... b13). I have been searching but haven't found anything. My last resort is to look at R source code and start converting it to Java. I am new to R and a little rusty on Statistics, so I apologize for all the dumb questions, and GREATLY appreciate your patience and help. Thanks. On Sat, Feb 13, 2010 at 3:20 PM, David Winsemius <dwinsem...@comcast.net>wrote: > > On Feb 13, 2010, at 5:03 PM, Something Something wrote: > > I tried.. >> >> mod = lm(Y ~ X1*X2*X3, na.action = na.exclude) >> formula(mod) >> >> This produced.... >> Y ~ X1 * X2 * X3 >> >> >> When I typed just mod I got: >> >> Call: >> lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude) >> >> Coefficients: >> (Intercept) X11 X21 X31 X11:X21 >> X11:X31 >> X21:X31 X11:X21:X31 >> 177.9245 0.2005 2.4482 3.1216 0.8127 -26.6166 >> -3.0398 29.6049 >> >> >> I am trying to figure out how R computed all these coefficients. >> > > From your question it is difficult to determine what sort of tutoring you > are expecting. To get the code of an R formula, you just type its name: > > lm > > Leads to lm.fit: > > lm.fit > > Reading further it appears the lm and lm.fit functions are really front > ends for this call: > > .Fortran("dqrls", qr = x, n = n, p = p, y = y, ny = ny, > tol = as.double(tol), coefficients = mat.or.vec(p, ny), > residuals = y, effects = y, rank = integer(1L), pivot = 1L:p, > qraux = double(p), work = double(2 * p), PACKAGE = "base") > > Seems pretty likely that is a QR decomposition-based method that i > implemented in compiled code. > > So if you want to go deeper, at least you know what to search for. Or if > you want to know how regression works on a matrix level, you should consult > a good reference text or Wikipedia, which is surprisingly good for that sort > of question these days. > > -- > David. > >> >> >> >> >> >> On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter <gunter.ber...@gene.com> >> wrote: >> >> ?formula >>> >>> >>> Bert Gunter >>> Genentech Nonclinical Statistics >>> >>> -----Original Message----- >>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >>> On >>> Behalf Of Something Something >>> Sent: Saturday, February 13, 2010 1:24 PM >>> To: Daniel Nordlund >>> Cc: r-help@r-project.org >>> Subject: Re: [R] lm function in R >>> >>> Thanks Dan. Yes that was very helpful. I didn't see the change from '*' >>> to >>> '+'. >>> >>> Seems like when I put * it means - interaction & when I put + it's not an >>> interaction. >>> >>> Is it correct to assume then that... >>> >>> When I put + R evaluates the following equation: >>> Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk >>> >>> >>> But when I put * R evaluates the following equation; >>> Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +........ + c >>> >>> Is this correct? If it is then can someone point me to any sources that >>> will explain how the coefficients (such as b0... bk, b12.. , b123..) are >>> calculated. I guess, one source is the R source code :) but is there any >>> other documentation anywhere? >>> >>> Please let me know. Thanks. >>> >>> >>> >>> On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund >>> <djnordl...@verizon.net>wrote: >>> >>> -----Original Message----- >>>>> From: r-help-boun...@r-project.org [mailto: >>>>> >>>> r-help-boun...@r-project.org] >>> >>>> On Behalf Of Something Something >>>>> Sent: Friday, February 12, 2010 5:28 PM >>>>> To: Phil Spector; r-help@r-project.org >>>>> Subject: Re: [R] lm function in R >>>>> >>>>> Thanks for the replies everyone. Greatly appreciate it. Some >>>>> >>>> progress, >>> >>>> but >>>>> now I am getting the following values when I don't use "as.factor" >>>>> >>>>> 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667 >>>>> >>>>> Is that what you guys get? >>>>> >>>> >>>> >>>> If you look at Phil's response below, no, that is not what he got. The >>>> difference is that you are specifying an interaction, whereas Phil did >>>> >>> not >>> >>>> (because the equation you initially specified did not include an >>>> interaction. Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula. >>>> >>>> >>>>> >>>>> On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector >>>>> <spec...@stat.berkeley.edu>wrote: >>>>> >>>>> By converting the two variables to factors, you are fitting >>>>>> an entirely different model. Leave out the as.factor stuff >>>>>> and it will work exactly as you want it to. >>>>>> >>>>>> dat >>>>>> >>>>>>> >>>>>>> V1 V2 V3 V4 >>>>>> 1 s1 14 4 1 >>>>>> 2 s2 23 4 2 >>>>>> 3 s3 30 7 2 >>>>>> 4 s4 50 7 4 >>>>>> 5 s5 39 10 3 >>>>>> 6 s6 67 10 6 >>>>>> >>>>>> names(dat) = c('id','y','x1','x2') >>>>>>> z = lm(y~x1+x2,dat) >>>>>>> predict(z) >>>>>>> >>>>>>> 1 2 3 4 5 6 15.16667 >>>>>> >>>>> 24.66667 >>> >>>> 27.66667 46.66667 40.16667 68.66667 >>>>>> >>>>>> >>>>>> - Phil Spector >>>>>> Statistical Computing >>>>>> >>>>> Facility >>> >>>> Department of Statistics >>>>>> UC Berkeley >>>>>> spec...@stat.berkeley.edu >>>>>> >>>>>> >>>>>> >>>>>> On Fri, 12 Feb 2010, Something Something wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>>> >>>>>>> I am trying to learn how to perform Multiple Regression Analysis in >>>>>>> >>>>>> R. >>> >>>> I >>>>> >>>>>> decided to take a simple example given in this PDF: >>>>>>> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf >>>>>>> >>>>>>> I created a small CSV called, students.csv that contains the >>>>>>> >>>>>> following >>> >>>> data: >>>>>>> >>>>>>> s1 14 4 1 >>>>>>> s2 23 4 2 >>>>>>> s3 30 7 2 >>>>>>> s4 50 7 4 >>>>>>> s5 39 10 3 >>>>>>> s6 67 10 6 >>>>>>> >>>>>>> Col headers: Student id, Memory span(Y), age(X1), speech rate(X2) >>>>>>> >>>>>>> Now the expected results are: >>>>>>> >>>>>>> yHat[0]:15.166666666666668 >>>>>>> yHat[1]:24.666666666666668 >>>>>>> yHat[2]:27.666666666666664 >>>>>>> yHat[3]:46.666666666666664 >>>>>>> yHat[4]:40.166666666666664 >>>>>>> yHat[5]:68.66666666666667 >>>>>>> >>>>>>> This is based on the following equation (given in the PDF): Y = >>>>>>> >>>>>> 1.67 >>> >>>> + >>>> >>>>> X1 >>>>> >>>>>> + >>>>>>> 9.50 X2 >>>>>>> >>>>>>> I ran the following commands in R: >>>>>>> >>>>>>> data = read.table("students.csv", head=F, as.is=T, na.string=".", >>>>>>> row.nam=NULL) >>>>>>> X1 = as.factor(data[[3]]) >>>>>>> X2 = as.factor(data[[4]]) >>>>>>> Y = data[[2]] >>>>>>> mod = lm(Y ~ X1*X2, na.action = na.exclude) >>>>>>> Y.hat = fitted(mod) >>>>>>> Y.hat >>>>>>> >>>>>>> This gives me the following output: >>>>>>> >>>>>>> Y.hat >>>>>>> >>>>>>>> >>>>>>>> 1 2 3 4 5 6 >>>>>>> 14 23 30 50 39 67 >>>>>>> >>>>>>> Obviously I am doing something wrong. Please help. Thanks. >>>>>>> >>>>>>> >>>> Hope this is helpful, >>>> >>>> Dan >>>> >>>> Daniel Nordlund >>>> Bothell, WA USA >>>> >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.