Re: [R] Seemingly simple "lm" giving unexpected results

Gene Leynes Mon, 16 Apr 2012 10:26:05 -0700

Bill,

Yes!  The time variable was the issue.  I should have mentioned that, but I
didn't want to open the door to the fact that I should probably be using
time series analysis instead of linear regression.


Because of the time offset, I felt comfortable subtracting out the minimum
of the observed x.  It makes sense to deal with the number of seconds over
the interval rather than the number of seconds since the beginning of time
(figuratively).  I was just surprised that I needed to take out the offset,
especially when using the actual unconverted POSIXct data as the X.

Although "orthogonal" and "polynomial" are both familiar words to me, I
don't have a good enough understanding of orthogonal polynomials to clearly
explain them, so I'm not inclined to use that.  I'll probably read up on it
more at some point.  I'm sure there are some great resources out there, I'd
look for something authored by John Fox for starters.

Since I'm looking for very simple and easily interpreted coefficients,
removing the offset is the method I'll stick with. But thanks for the poly
example.  That was interesting, it might work for someone else, and I might
return to it later.

Thanks again

Gene


On Mon, Apr 16, 2012 at 9:55 AM, William Dunlap <wdun...@tibco.com> wrote:

> Instead of changing the tolerance of the qr decomposition
> you could use poly(x1,df=1) to center and scale x1.  The coefficients
> will be different but the predictions will be fine (because the
> centering and scaling information is stored in the fitted model
> as the "predvars" attribute of the terms component):
>  > predict(lm(y~poly(x1,df=1)), newdata=list(x1=min(x1)+10))
>         1
>   129.4292
>  > attr(terms(lm(y~poly(x1,df=1))), "predvars")
>  list(y, poly(x1, df = 1, coefs = list(alpha = 1334009455.477,
>      norm2 = c(1, 9, 9131.28718957214))))
> vs. doing the offset by hand
>  > x2 <- x1 - min(x1)
>  > predict(lm(y~x2), newdata=list(x2=10))
>         1
>  129.4292
>
> I've wondered if lm() should automatically precondition variables
> of any of the time-related classes by removing the minimum,
> and perhaps divide by a scale factor.  Big offsets are a common
> problem with time variables.  (Your x1 is close to the number
> of seconds from 1970 until last week, what as.numeric(POSIXct)
> would give).
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Gene Leynes
> > Sent: Monday, April 16, 2012 12:42 AM
> > To: peter dalgaard
> > Cc: r-help
> > Subject: Re: [R] Seemingly simple "lm" giving unexpected results
> >
> > Peter,
> >
> > This is exactly the answer I was wanted.
> >
> > 1) I was a little fuzzy on how the qr decomposition was creating the
> "error"
> > 2) I wanted to know if it was possible to just change a setting to get
> > around the "error".  Changing the tol in lm makes a lot more sense to me
> > than changing the global eps.
> >
> > Also, I love the "<lecture on numerical linear algebra>" tags.  I wish
> that
> > conceptual structure was a common convention in human language.
> >
> > Thanks,
> >
> > Gene
> >
> > On Sat, Apr 14, 2012 at 2:45 PM, peter dalgaard <pda...@gmail.com>
> wrote:
> >
> > >
> > > On Apr 14, 2012, at 14:40 , Berend Hasselman wrote:
> > >
> > > >
> > > > On 13-04-2012, at 22:20, Gene Leynes wrote:
> > > >
> > > >> I can't figure out why this is returning an NA for the slope in one
> > > case,
> > > >> but not in the other.
> > > >>
> > > >> I can tell that R thinks the first case is singular, but why isn't
> the
> > > >> second?
> > > >>
> > > >> ## Define X and Y
> > > >> ## There are two versions of x
> > > >> ##     1) "as is"
> > > >> ##     2) shifted to start at 0
> > > >> y  = c(58, 57, 57, 279, 252, 851, 45, 87, 47)
> > > >> x1 = c(1334009411.437, 1334009411.437, 1334009411.437,
> 1334009469.297,
> > > >>       1334009469.297, 1334009469.297, 1334009485.697,
> 1334009485.697,
> > > >> 1334009485.697)
> > > >> x2 = x1 - min(x1)
> > > >>
> > > >> ## Why doesn't the LM model work for the "as is" x?
> > > >> lm(y~x1)
> > > >> lm(y~x2)
> > > >
> > > >
> > > > With your data  the matrix t(X)%*%X is extremely ill-conditioned for
> the
> > > case with x1.
> > > > See
> > > >
> > > > http://en.wikipedia.org/wiki/Condition_number
> > > > http://mathworld.wolfram.com/ConditionNumber.html
> > > >
> http://en.wikipedia.org/wiki/Numerical_methods_for_linear_least_squares
> > > >
> > > > You can check it with
> > > >
> > > > makeXX <- function(x) {
> > > >    matrix(data=c(x,rep(1,length(x))),nrow=length(x), byrow=FALSE)
> > > > }
> > > >
> > > > X1 <- makeXX(x1)
> > > > (XTX1 <- t(X1) %*% X1)
> > > > svd(XTX1)
> > > >
> > > > and similar for x2.
> > >
> > > <lecture on numerical linear algebra>
> > >
> > > lm() is actually a bit smarter than to use the textbook formula
> > > (X'X)^{-1}X'Y, it is internally based on a QR decomposition which is
> > > numerically far more stable. What it does is effectively to
> orthogonalize
> > > the columns of X successively.
> > >
> > > > x <- cbind(1, x1)
> > > > qr(x)
> > > $qr
> > >                            x1
> > >  [1,] -3.0000000 -4.002028e+09
> > >  [2,]  0.3333333  9.555777e+01
> > >  [3,]  0.3333333  3.456548e-01
> > >  [4,]  0.3333333 -2.598428e-01
> > >  [5,]  0.3333333 -2.598428e-01
> > >  [6,]  0.3333333 -2.598428e-01
> > >  [7,]  0.3333333 -4.314668e-01
> > >  [8,]  0.3333333 -4.314668e-01
> > >  [9,]  0.3333333 -4.314668e-01
> > >
> > > $rank
> > > [1] 1
> > >
> > > $qraux
> > > [1] 1.333333 1.345655
> > >
> > > $pivot
> > > [1] 1 2
> > >
> > > attr(,"class")
> > > [1] "qr"
> > >
> > > However, notice the rank of 1. That's because it wants to protect the
> user
> > > against unwittingly plugging in a singular design matrix, so when the
> > > algorithm encounters a variable that looks like it is getting reduced
> to
> > > zero by after orthogonalization, it basically throws it out.
> > >
> > > You can defeat this mechanism by setting the tol= argument sufficiently
> > > low. In fact, you can do the same with lm itself:
> > >
> > > > lm(y ~ x1, tol = 0)
> > > ...
> > > (Intercept)           x1
> > >  -2.474e+09    1.854e+00
> > >
> > > Notice that the slope is consistent with the
> > >
> > > > lm(y ~ x2)
> > > ...
> > > (Intercept)           x2
> > >    110.884        1.854
> > >
> > > In contrast if you try the textbook way:
> > >
> > > > solve(crossprod(x), crossprod(x,y))
> > > Error in solve.default(crossprod(x), crossprod(x, y)) :
> > >  system is computationally singular: reciprocal condition number =
> > > 2.67813e-34
> > >
> > > Ah, well, let's try and defeat that:
> > >
> > > > solve(crossprod(x), crossprod(x,y), tol=0)
> > >            [,1]
> > >   -2.959385e+09
> > > x1  2.218414e+00
> > >
> > > Which, as you'll notice is, er, somewhat off the mark.
> > >
> > > </lecture on numerical linear algebra>
> > > --
> > > Peter Dalgaard, Professor,
> > > Center for Statistics, Copenhagen Business School
> > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > > Phone: (+45)38153501
> > > Email: pd....@cbs.dk  Priv: pda...@gmail.com
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Seemingly simple "lm" giving unexpected results

Reply via email to