I have gone through McCullough and Wilson, two of the papers from the Stern
school (Simon and Simonoff), Cryer, and Goldwater and have workarounds or
fixes for each of the problems that they raise (with data sets) about EXCEL.
There is one however that I don't have a fix for. For example using EXCEL, I
can beat Stata (IRE(coefficients)=11.1 to 15.1) on a 10th order polynomial
fit to the NIST Filip Data Set.
The one that I can't fix, is the problem of multivariate linear regression
fit. In this case there is a singular or near singular X matrix. Given N
columns of X, the rank of X here is N-1 or less.
Lets assume we have the case of the dental data from Gary Simon. Here we
have 54 observations, with 5 variables, 1 for the intercept, 1 for a set of
different values from 0 to 8 (Molar Numb), 2 variables which are indicator
variables (Alc and Drg) and one variable (Drg Grp) is a sum (actually 4
minus the sum) of the two indicator variables. X clearly is singular, but
EXCEL gives an answer with LINEST. I have my own set of matrix and
regression subroutines in QB, which I used to clarify why this occurs.
When I do a Gaussian triangularization with pivoting on X, I get a very
clear true zero on the pivoting, indicating singularity. Going to the X'X
positive definite 5x5 matrix, I end up with reasonable values, but it still
is a singular matrix. Doing the Gaussian again, I get a diagonal value of
8.4E-15, which is close to zero, but not a true zero. Therefore I can invert
X'X in EXCEL and it works, but the values are screwy. I get values of about
E+14 with one row and one column of E-03 values. No divide check errors.
With finite values of X'Y and (X'X)^-1, you get a reasonable set of
coefficient values.
These are my questions:
1. Given that X is singular, how can I test this in EXCEL given that I only
have the MMULT, MINVERSE and MDETERM matrix functions available in EXCEL?
One test is to calculate MDETERM on the X'X matrix. For this data set, the
determinant of X'X is ~E-08, which is not particularly small. Another test
is to multiply (X'X)^-1 times X"X and look for something clearly not a
unitary matrix. In this case it very clearly is nothing like 'I'. I tend to
favor the latter.
2. Do we or do we not teach accountants and business majors in their
introductory stat class based on EXCEL about matrices and singularities,
rank and eigenvalues? Assuming that this is never covered or taught, what
supplementary material should be passed out along with the training on the
use of EXCEL?
3. What is the appropriate screen to use with EXCEL (without additional
macros) to indicate that the results are wrong in a regression. With the
complicated data sets now being fitted, singularity is not obvious.
4. Telling students that EXCEL does not properly compute multivariate
regression is obviously an over-kill.
Any thoughts.
DAHeiser
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
http://jse.stat.ncsu.edu/
=================================================================