[Rd] behaviour and documentation of qr.solve

2019-08-14 Thread Michael Meyer via R-devel


Greetings,

In my opinion the documentation or behaviour of qr.solve, qr.coef, qr.resid, 
and qr.fitted is not easily comprehensible and unfortunate.
We all know that a linear system Ax=b can have 0, one or infinitely many 
solutions. To treat all these cases uniformly we can rephrase the problem
as 
x = argmin_u||Au-b||,   


where ||.|| denotes the Euclidean norm. There is then exactly one natural and 
distinguished solution x, the minimizer x which is itself of minimal
Euclidean Norm. So if we want to return only one solution, I think we can agree 
that this should be it.

In fact this very solution can be computed from the QR-decomposition of either 
A (overdetermined system) or t(A) (underdetermined system).
I tried qr.solve on the underdetermined system Ax=b with  

b <- c(3,5,7)   
and
A <- rbind(
c(1,1,1,1,1),
c(1,2,2,2,2),
c(1,2,3,3,3)
)
The system has infinitely many solutions. The minimal norm solution is 
x=c(1,0,2/3,2/3,2/3).
But qr.solve(A,b) yielded the solution x=(1,0,2,0,0) which is destinguished 
only by being sparse and I do not think qr.solve
tries to compute the sparsest solution. So what does qr.solve do in case of an 
underdetermined system? 
It is not documented.

Then I tried to figure out what qr.coef, qr.resid, and qr.fitted do. I had to 
do actual experiments to figure out that it seems to solve the
problem x=argmin_u||Au-y|| with
   qr.coef(A,y) = x but which x 
when there are infinitely many?
   qr.fitted(A,y) = Ax
   qr.resid(A,y) = y-Ax 
but this is certainly not evident from the language in the documentation which 
conflates qr(x,...) with solving the system Ax=b then states:

"The functions qr.coef, qr.resid, and qr.fitted return the coefficients, 
residuals and fitted values obtained 
  when fitting y to the matrix with QR decomposition qr."

Since when do we call solving a system of equations "fitting the right hand 
side to the matrix ..." or call the solution x "the coefficients"
(which more usually are the elements of A) or introduce the "fitted values" 
with no definition?
Moreover the language does not fit the underdetermined case Ax=y, where we need 
the QR-decomposition of t(A) and not of A  to compute the
minimizer x = argmin_u||Au-y|| which is itself of minimal norm.

Or, maybe this is not at all what these functions are doing.
But then, what is it and should this not be evident from the documentation?

Sincerely,

Michael Meyer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Underscores in package names

2019-08-14 Thread Martin Maechler
> Duncan Murdoch 
> on Fri, 9 Aug 2019 20:23:28 -0400 writes:

> On 09/08/2019 4:37 p.m., Gabriel Becker wrote:
>> Duncan,
>> 
>> 
>> On Fri, Aug 9, 2019 at 1:17 PM Duncan Murdoch > > wrote:
>> 
>> On 09/08/2019 2:41 p.m., Gabriel Becker wrote:
>> > Note that this proposal would make mypackage_2.3.1 a valid
>> *package name*,
>> > whose corresponding tarball name might be mypackage_2.3.1_2.3.2
>> after a
>> > patch. Yes its a silly example, but why allow that kind of ambiguity?
>> >
>> CRAN already has a package named "FuzzyNumbers.Ext.2", whose tarball is
>> FuzzyNumbers.Ext.2_3.2.tar.gz, so I think we've already lost that game.
>> 
>> 
>> I suppose technically 2 is a valid version number for a package (?) so I 
>> suppose you have me there. But as Ben pointed out while I was writing 
>> this, all I can really say is that in practice they read to me (as 
>> someone who has administered R on a large cluster and written 
>> build-system software for it) as substantially different levels of 
>> ambiguity. I do acknowledge, as Ben does, that yes a more complex 
>> regular expression/splitting algorithm can be written that would handle 
>> the more general package names. I just don't personally see a motivation 
>> that justifies changing something this fundamental (even if it is both 
>> narrow and was initially more or less arbitrarily chosen) about R at 
>> this late date.
>> 
>> I guess at the end of the day, I guess what I'm saying is that breaking 
>> and changing things is sometimes good, but if we're going to rock the 
>> boat personally I'd want to do so going after bigger wins than this one. 
>> Thats just my opinion though.

> Sorry, I wasn't clear.  I agree with you.  I was just saying that the 
> particular argument based on ugly tarball names isn't the reason.

> Duncan Murdoch

Thank you (and Gabe).

We have had some R core internal "talk" about Jim Hester's
suggestion (of adding underscores to the allow characters in
package names).
Duncan had already given a good reason why such a change would be problematic
(the underscore being used as unique separator of package name
 and version in source and binary package archives),
and with Jim's offer to find and provide patches for all places
this is used in the R sources, we've convinced ourselves that
there is much more code "out there", notably 'devops' code in
scripts, which currently relies on the current package naming
rules and which could break, often only rarely and hence
possibly unnoticed for too long.

Also, we've not seen compelling arguments why the current scheme
would be too limited (people mentioned that if you must use a
separator, "." was available).

Consequence:  We stay with the stability principle and the
package naming scheme is _not_ going to be changed for now.

Martin Maechler
ETH Zurich and R Core Team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel