[Rd] GCV in lm.ridge (MASS) (PR#10755)

2008-02-13 Thread A . Robinson
Full_Name: Andrew Robinson
Version: 2.6.2 Patched (2008-02-12 r44439)
OS: FreeBSD 6.3-RC1
Submission from: (NULL) (211.28.206.186)


I believe that the computation for GCV is incorrect in the lm.ridge function in
MASS. 

>From lm.ridge:

GCV <- colSums((Y - X %*% coef)^2)/
(n - colSums(matrix(d^2/div, dx)))^2
  
The denominator does not tally with the formula on p. 141 of Ripley's Pattern
Recognition & Neural Networks.  I think that it should be

GCV <- colSums((Y - X %*% coef)^2)/
(1 - colSums(matrix(d^2/div, dx))/n)^2 / n

Also, neither formula (above) counts the intercept amongst the parameters
This makes sense from the point of view that the intercept is not
shrunk in ridge regression, but if it has been conditioned on for
computing the residual sum of squares, then there is an argument that
it should be included in the trace of the mapping matrix anyway.

Thanks

Andrew


> sessionInfo()
R version 2.6.2 Patched (2008-02-12 r44439) 
i386-unknown-freebsd6.3 

locale:
C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] MASS_7.2-40

loaded via a namespace (and not attached):
[1] rcompgen_0.1-17

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Cook's Distance in GLM (PR#9316)

2006-10-24 Thread A . Robinson
Hi Community,

I'm trying to reconcile Cook's Distances computed in glm.  The
following snippet of code shows that the Cook's Distances contours on
the plot of Residuals v Leverage do not seem to be the same as the
values produced by cooks.distance() or in the Cook's Distance against
observation number plot.

counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
d.AD <- data.frame(treatment, outcome, counts)
glm.D93 <- glm(counts ~ outcome + treatment, family=poisson())

opar <- par(mfrow=c(2,1))
plot(glm.D93, which=c(4,5))
par(opar)

cooks.distance(glm.D93)

The difference is reasonably moderate in this case.  My suspicions
were aroused by a case in which the plot showed five or size points
greater than 1, none of which could be identified in the output of the
function. 

> version  
   _  
platform   i386-unknown-freebsd6.1
arch   i386   
os freebsd6.1 
system i386, freebsd6.1   
status Patched
major  2  
minor  4.0
year   2006   
month  10 
day03 
svn rev39576  
language   R  
version.string R version 2.4.0 Patched (2006-10-03 r39576)


Cheers

Andrew

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel