Ted, I just ran everything using the log of all variables. Much better analysis and it doesn't violate the assumptions.
I'm still in the dark concerning the classification equation- other than the fact that it now will contain log functions. Thank you for you help, Chase Ted.Harding-2 wrote: > > [Apologies -- I made an error (see at [***] near the end)] > > On 24-May-09 19:07:46, Ted Harding wrote: >> [Your data and output listings removed. For comments, see at end] >> >> On 24-May-09 13:01:26, cdm wrote: >>> Fellow R Users: >>> I'm not extremely familiar with lda or R programming, but a recent >>> editorial review of a manuscript submission has prompted a crash >>> course. I am on this forum hoping I could solicit some much needed >>> advice for deriving a classification equation. >>> >>> I have used three basic measurements in lda to predict two groups: >>> male and female. I have a working model, low Wilk's lambda, graphs, >>> coefficients, eigenvalues, etc. (see below). I adjusted the sample >>> analysis for Fisher's or Anderson's Iris data provided in the MASS >>> library for my own data. >>> >>> My final and last step is simply form the classification equation. >>> The classification equation is simply using standardized coefficients >>> to classify each group- in this case male or female. A more thorough >>> explanation is provided: >>> >>> "For cases with an equal sample size for each group the classification >>> function coefficient (Cj) is expressed by the following equation: >>> >>> Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp >>> >>> where Cj is the score for the jth group, j = 1 ⦠k, cjo is the >>> constant for the jth group, and x = raw scores of each predictor. >>> If W = within-group variance-covariance matrix, and M = column matrix >>> of means for group j, then the constant cjo= (-1/2)CjMj" (Julia >>> Barfield, John Poulsen, and Aaron French >>> http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm). >>> >>> I am unable to navigate this last step based on the R output I have. >>> I only have the linear discriminant coefficients for each predictor >>> that would be needed to complete this equation. >>> >>> Please, if anybody is familiar or able to to help please let me know. >>> There is a spot in the acknowledgments for you. >>> >>> All the best, >>> Chase Mendenhall >> >> The first thing I did was to plot your data. This indicates in the >> first place that a perfect discrimination can be obtained on the >> basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated >> to WG, WT, ID, SEX): >> >> d.csv("horsesLDA.csv") >> # names(D0) # "WRMA_WG" "WRMA_WT" "WRMA_ID" "WRMA_SEX" >> WG<-D0$WRMA_WG; WT<-D0$WRMA_WT; >> ID<-D0$WRMA_ID; SEX<-D0$WRMA_SEX >> >> ix.M<-(SEX=="M"); ix.F<-(SEX=="F") >> >> ## Plot WT vs ID (M & F) >> plot(ID,WT,xlim=c(0,12),ylim=c(8,15)) >> points(ID[ix.M],WT[ix.M],pch="+",col="blue") >> points(ID[ix.F],WT[ix.F],pch="+",col="red") >> lines(ID,15.5-1.0*(ID)) >> >> and that there is a lot of possible variation in the discriminating >> line WT = 15.5-1.0*(ID) >> >> Also, it is apparent that the covariance between WT and ID for Females >> is different from the covariance between WT and ID for Males. Hence >> the assumption (of common covariance matrix in the two groups) for >> standard LDA (which you have been applying) does not hold. >> >> Given that the sexes can be perfectly discriminated within the data >> on the basis of the linear discriminator (WT + ID) (and others), >> the variable WG is in effect a close approximation to noise. >> >> However, to the extent that there was a common covariance matrix >> to the two groups (in all three variables WG, WT, ID), and this >> was well estimated from the data, then inclusion of the third >> variable WG could yield a slightly improved discriminator in that >> the probability of misclassification (a rare event for such data) >> could be minimised. But it would not make much difference! >> >> However, since that assumption does not hold, this analysis would >> not be valid. >> >> If you plot WT vs WG, a common covariance is more plausible; but >> there is considerable overlap for these two variables: >> >> plot(WG,WT) >> points(WG[ix.M],WT[ix.M],pch="+",col="blue") >> points(WG[ix.F],WT[ix.F],pch="+",col="red") >> >> If you plot WG vs ID, there is perhaps not much overlap, but a >> considerable difference in covariance between the two groups: >> >> plot(ID,WG) >> points(ID[ix.M],WG[ix.M],pch="+",col="blue") >> points(ID[ix.F],WG[ix.F],pch="+",col="red") >> >> This looks better on a log scale, however: >> >> lWG <- log(WG) ; lWT <- log(WT) ; lID <- log(ID) >>## Plot log(WG) vs log(ID) (M & F) >> plot(lID,lWG) >> points(lID[ix.M],lWG[ix.M],pch="+",col="blue") >> points(lID[ix.F],lWG[ix.F],pch="+",col="red") >> >> and common covaroance still looks good for WG vs WT: >> >> ## Plot log(WT) vs log(WG) (M & F) >> plot(lWG,lWT) >> points(lWG[ix.M],lWT[ix.M],pch="+",col="blue") >> points(lWG[ix.F],lWT[ix.F],pch="+",col="red") >> >> but there is no improvement for WG vs IG: >> >> ## Plot log(WT) vs log(ID) (M & F) >> plot(ID,WT,xlim=c(0,12),ylim=c(8,15)) >> points(ID[ix.M],WT[ix.M],pch="+",col="blue") >> points(ID[ix.F],WT[ix.F],pch="+",col="red") > > [***] > The above is incorrect! Apologies. I plotted the raw WT and ID > instead of their logs. In fact, if you do plot the logs: > > ## Plot log(WT) vs log(ID) (M & F) > plot(lID,lWT) > points(lID[ix.M],lWT[ix.M],pch="+",col="blue") > points(lID[ix.F],lWT[ix.F],pch="+",col="red") > > you now get what looks like much closer agreement between the > covariance cov(lID,lWT) then before. Hence, I would now suggest > that you do your limear discrimination on the logarithms of the > variables (since you also get agreement for the other pairs on > the log scale. > > In fact: > > [Raw]: > [Male]: > cov(cbind(WG,WT,ID)[ix.M,]) > # WG WT ID > # WG 2.2552465 0.11074710 -0.02202080 > # WT 0.1107471 0.33853450 0.06601287 > # ID -0.0220208 0.06601287 0.31979368 > > [Female]: > cov(cbind(WG,WT,ID)[ix.F,]) > # WG WT ID > # WG 2.4716912 0.1577307 0.6670657 > # WT 0.1577307 0.3183928 0.2973335 > # I D 0.6670657 0.2973335 2.8326520 > > [log]: > [Male]: > cov(cbind(lWG,lWT,lID)[ix.M,]) > # lWG lWT lID > # lWG 0.0006584465 0.0001813315 -0.0002133576 > # lWT 0.0001813315 0.0030368382 0.0030442356 > # lID -0.0002133576 0.0030442356 0.0693965979 > > [Female]: > cov(cbind(lWG,lWT,lID)[ix.F,]) > # lWG lWT lID > # lWG 0.0007244826 0.0002171885 0.001951343 > # lWT 0.0002171885 0.0019640076 0.003305884 > # lID 0.0019513428 0.0033058841 0.068406840 > > >> So there is no simple road to applying a routine LDA to your data. >> >> To take account of different covariances between the two groups, >> you would normally be looking at a quadratic discriminator. However, >> as indicated above, the fact that a linear discriminator using >> the variables ID & WT alone works so well would leave considerable >> imprecision in conclusions to be drawn from its results. >> >> Sorry this is not the straightforward answer you were hoping for >> (which I confess I have not sought); it is simply a reaction to >> what your data say. >> >> Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> > Fax-to-email: +44 (0)870 094 0861 > Date: 24-May-09 Time: 21:49:50 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Animal-Morphology%3A-Deriving-Classification-Equation-with-Linear-Discriminat-Analysis-%28lda%29-tp23693355p23698217.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.