[Rd] qr.coef: permutes dimnames; inserts NA; promises minimum-length (PR#9623)

2007-04-20 Thread brech
Full_Name: Christian Brechbuehler
Version: 2.4.1 Patched (2007-03-25 r40917)
OS: Linux 2.6.15-27-adm64-xeon; Ubuntu 6.06.1 LTS
Submission from: (NULL) (24.61.47.236)


Splus and R have different ideas about what qr.coef(qr()) should return,
which is fine... but I believe that R has a bug in that it is not
internally consistent, and another separate bug in the documentation. 

In particular, on this problem, Splus generates a permuted solution
vector:

Splus 6.2.1:
> A <- matrix(c(0,0,0, 1,1,1), nrow=3,dimnames=list(NULL, c("zero",
"one")))
> A
 zero one 
[1,]0   1
[2,]0   1
[3,]0   1
> y <- matrix(c(6,7,8), nrow=3, dimnames=list(NULL, "y"))
> y
 y 
[1,] 6
[2,] 7
[3,] 8
> x <- qr.coef(qr(A), y)
> x
 y 
 one 7
zero 0
> 

To my taste, the answer from Splus is unexpected; I was looking for
this:

> x[qr(A)$pivot,,drop=F]
 y 
zero 0
 one 7

But at least the answer is internally consistent: the values and the
row names were scrambled in corresponding ways.

However, R seems to helpfully un-permute the data values, but forgets
to un-permute the dimnames, which seems broken.  

R version 2.4.1 Patched (2007-03-25 r40917)
> A <- matrix(c(0,0,0, 1,1,1), nrow=3,dimnames=list(NULL, c("zero",
"one")))
> y <- matrix(c(6,7,8), nrow=3, dimnames=list(NULL, "y"))
> x <- qr.coef(qr(A), y)
> x
  y
one  NA
zero  7

I think the answer from qr.coef() should look more like this:

> x.wanted
  y
zero NA
one   7
> 

In addition, R uses NA to fill in rather than zero.  NA for this
problem above might mean "undefined": any value will do.  But given
the application of this function, where the solution might be
multiplied by the matrix, any NA will cause that to turn into an NA
result... in math, zero times anything is zero, but in R, NA times
anything (even zero) is NA.  This seems somewhat inconvenient.  So I'd really
like
R to return this:

> x
  y
zero  0
one   7

More on the scrambling; I see in dqrsl.f that:

c   if pivoting was requested in dqrdc, the j-th
c   component of b will be associated with column jpvt(j)
c   of the original matrix x that was input into dqrdc.)

which I think is referring to the helpful un-permuting, but I think
qr.coef() in R needs to correspondingly un-permute the dimnames as
well?

  Code from qr.coef: 
if (!is.null(nam <- colnames(qr$qr))) 
rownames(coef) <- nam
  Proposed fix: That second line could be changed to
rownames(coef)[qr$pivot] <- nam


Another issue (either with the documentation or the code) is as follows.
In R, help(qr) promises:
   
   'solve.qr' is the method for 'solve' for 'qr' objects. 'qr.solve'
   solves systems of equations via the QR decomposition: if 'a' is a
   QR decomposition it is the same as 'solve.qr', but if 'a' is a
   rectangular matrix the QR decomposition is computed first.  Either
   will handle over- and under-determined systems, providing a
   minimal-length solution or a least-squares fit if appropriate.
   
So unlike Splus, R promises a minimal-length solution, but I don't
think it delivers that.  For example, let's get the minimal-length
solution for the following under-determined system:
   
 x1 + 2*x2  ==  5
   
qr.solve would fail due to the singular matrix, so use qr.coef:
   
 > qr.coef(qr(t(1:2)),c(5))
   
   R:
 [1]  5 NA
   
   Splus:
 [1] 5 0
   
1*5 + 2*0 does equal 5, so modulo NA, this is a solution, but it is
NOT minimal-length.  OTOH, svd gives the right answer in both Splus and
R:
   
 > d <- svd(t(1:2));  c(5) %*% d$u %*% diag(1/d$d, length(d$d)) %*% t(d$v)
  [,1] [,2]
 [1,]12
   
This *is* minimum length, and 1*1 + 2*2 == 5.  So either qr.coef(qr())
should deliver that, or the documentation should clarify when a
minimal-length solution is delivered?

And revisiting the NA issue... in this problem, for the qr.coef()
result to be a solution, that 2nd value MUST be zero, no other values
will work.  So the NA seems wrong here, more clearly than in the other
example.

Finally, a simpler test case that shows the same issues with a 2x1
example instead of 2x3; any fix to qr.coef() should handle this as
well:

> qr.coef(qr(matrix(0:1,1,dimnames=list(NULL, c("zero","one",5)
 one zero 
  NA5 

I think that should return:

 zero  one
05 

or at least:

 zero  one
   NA5

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (PR#9623) qr.coef: permutes dimnames; inserts NA; promises

2007-05-03 Thread brech
> From: Prof Brian Ripley <[EMAIL PROTECTED]>
> Date: Tue, 1 May 2007 15:01:51 +0100 (BST)
> 
> On Thu, 19 Apr 2007, [EMAIL PROTECTED] wrote:
> 
> > Full_Name: Christian Brechbuehler
> > Version: 2.4.1 Patched (2007-03-25 r40917)
> > OS: Linux 2.6.15-27-adm64-xeon; Ubuntu 6.06.1 LTS
> > Submission from: (NULL) (24.61.47.236)
> >
> >
> > I believe that R has a bug in that it is not
> > internally consistent, and another separate bug in the documentation.
> 
> I agree with the bug in the dimnames, and and have corrected that in 2.5.0 
> patched.

I see it in svn:

   if(!is.null(nam <- colnames(qr$qr)))
  -   rownames(coef) <- nam
  +if(k < p) rownames(coef)[qr$pivot] <- nam
  +else rownames(coef) <- nam
  +

Thank you!


> But I think the rest stems from the following misunderstanding:
> 
> > in math, zero times anything is zero, but in R, NA times
> > anything (even zero) is NA.  This seems somewhat inconvenient.
> 
> That just is not true.

OK, I accept that.

> Stemming from this, what R is reporting is that certain columns are not 
> used in the calculation.

OK, I understand qr.coef indicates with NA that dcrdc2 decided to
exclude the corresponding columns (because of linear dependency).

I still think the documentation is misleading -- trimming to the essence:

> > help(qr):
> >
> >   'solve.qr' is the method for 'solve' for 'qr' objects. 'qr.solve'
> >   solves systems of equations via the QR decomposition: if 'a' is a
> >   QR decomposition it is the same as 'solve.qr', but if 'a' is a
> >   rectangular matrix the QR decomposition is computed first.  Either
> >   will handle over- and under-determined systems, providing a
> >   minimal-length solution or a least-squares fit if appropriate.

(A)
'qr.solve' and 'solve.qr' will NOT handle under-determined systems.
They both perform this check:

if (a$rank != nc) 
stop("singular matrix 'a' in solve")

But 'qr.coef', which they call when all is well, does.

(B)
Help promises a minimal-length solution, but QR does not deliver that.

> > > qr.coef(qr(t(1:2)), 5)
> >
> > [1]  5 NA

> > 1*5 + 2*0 does equal 5, so this is a solution, but it is
> > NOT minimal-length.  OTOH
> >
> > > d <- svd(t(1:2));  5 %*% d$u %*% (d$d^-1 * t(d$v))
> >  [,1] [,2]
> > [1,]12
> >
> > This *is* minimum length, and 1*1 + 2*2 == 5.
> >
> > The documentation should clarify when minimal-length solution is
> > provided.

Maybe the phrase "a minimal-length solution or" should be removed?

/Christian Brechbuehler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel