I have two specific questions regarding the output of lda function in MASS.
#Question1: #========= n: sample size, p: number of variables Some articles in the literature say that LDA is singular for p > n-1. However, my experimentation with lda (default arguments) for two class problems shows collinearity for p > n-2. Does anyone know why this is the case? Does lda (MASS) use a different algorithm? #Question2: #========= When I plot leave-one-out CV based on lda (averaged over 500 simulated data sets), I see a pick (see the link http://homepages.ed.ac.uk/mkhondok/temp/lda_R-help-CV.png ) at p = n-3 (not n-2!). I would appreciate if someone could help me get an explanation for this behaviour. ## R code ## Reproducible example library(MASS) # n: sample size # p: number of variables ## Function ## -------- test.fun<-function(n,p){ x<-matrix (rnorm(n*p), ncol=p) x[1:(n/2),]<-x[1:(n/2),]+1 colnames(x)<-paste("V",1:p, sep="") y<-rep(c("G1", "G2"), each=n/2) dat<-data.frame(y,x) lda(y~., data=dat) } test.fun(20, 20) ## Warning: Variables are collinear test.fun(20, 19) ## Warning: Variables are collinear test.fun(20, 18) ## OK > sessionInfo() R version 2.8.0 (2008-10-20) i486-pc-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MASS_7.2-45 > -- Mizanur Khondoker Division of Pathway Medicine (DPM) The University of Edinburgh Medical School The Chancellor's Building 49 Little France Crescent Edinburgh EH16 4SB United Kingdom Tel: +44 (0) 131 242 6287 Fax: +44 (0) 131 242 6244 http://homepages.ed.ac.uk/mkhondok -- Mizanur Khondoker Division of Pathway Medicine (DPM) The University of Edinburgh Medical School The Chancellor's Building 49 Little France Crescent Edinburgh EH16 4SB United Kingdom Tel: +44 (0) 131 242 6287 Fax: +44 (0) 131 242 6244 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.