Does anyone know of any literature on the kappa statistic with plsda? I have been trying to find papers that used plsda for classification and have yet to come across this kappa value. All the papers I come across typically have R2 as an indicator of model fit. I want to make sure I conduct such analysis appropriately, any guidance is appreciated.
Regards, Charles On Sun, Mar 3, 2013 at 4:38 PM, Max Kuhn <mxk...@gmail.com> wrote: > That the most common formula, but not the only one. See > > Kvålseth, T. (1985). Cautionary note about $R^2$. *American Statistician > *, *39*(4), 279285. > > Traditionally, the symbol 'R' is used for the Pearson correlation > coefficient and one way to calculate R^2 is... R^2. > > Max > > > On Sun, Mar 3, 2013 at 3:16 PM, Charles Determan Jr <deter...@umn.edu>wrote: > >> I was under the impression that in PLS analysis, R2 was calculated by 1- >> (Residual sum of squares) / (Sum of squares). Is this still what you are >> referring to? I am aware of the linear R2 which is how well two variables >> are correlated but the prior equation seems different to me. Could you >> explain if this is the same concept? >> >> Charles >> >> >> On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn <mxk...@gmail.com> wrote: >> >>> > Is there some literature that you make that statement? >>> >>> No, but there isn't literature on changing a lightbulb with a duck >>> either. >>> >>> > Are these papers incorrect in using these statistics? >>> >>> Definitely, if they convert 3+ categories to integers (but there are >>> specialized R^2 metrics for binary classification models). Otherwise, they >>> are just using an ill-suited "score". >>> >>> How would you explain such an R^2 value to someone? R^2 is >>> a function of correlation between the two random variables. For two >>> classes, one of them is binary. What does it mean? >>> >>> Historically, models rooted in computer science (eg neural networks) >>> used RMSE or SSE to fit models with binary outcomes and that *can* work >>> work well. >>> >>> However, I don't think that communicating R^2 is effective. Other >>> metrics (e.g. accuracy, Kappa, area under the ROC curve, etc) are designed >>> to measure the ability of a model to classify and work well. With 3+ >>> categories, I tend to use Kappa. >>> >>> Max >>> >>> >>> >>> >>> On Sun, Mar 3, 2013 at 10:53 AM, Charles Determan Jr >>> <deter...@umn.edu>wrote: >>> >>>> Thank you for your response Max. Is there some literature that you >>>> make that statement? I am confused as I have seen many publications that >>>> contain R^2 and Q^2 following PLSDA analysis. The analysis usually is to >>>> discriminate groups (ie. classification). Are these papers incorrect in >>>> using these statistics? >>>> >>>> Regards, >>>> Charles >>>> >>>> >>>> On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn <mxk...@gmail.com> wrote: >>>> >>>>> Charles, >>>>> >>>>> You should not be treating the classes as numeric (is virginica >>>>> really three times setosa?). Q^2 and/or R^2 are not appropriate for >>>>> classification. >>>>> >>>>> Max >>>>> >>>>> >>>>> On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr >>>>> <deter...@umn.edu>wrote: >>>>> >>>>>> I have discovered on of my errors. The timematrix was unnecessary >>>>>> and an >>>>>> unfortunate habit I brought from another package. The following >>>>>> provides >>>>>> the same R2 values as it should, however, I still don't know how to >>>>>> retrieve Q2 values. Any insight would again be appreciated: >>>>>> >>>>>> library(caret) >>>>>> library(pls) >>>>>> >>>>>> data(iris) >>>>>> >>>>>> #needed to convert to numeric in order to do regression >>>>>> #I don't fully understand this but if I left as a factor I would get >>>>>> an >>>>>> error following the summary function >>>>>> iris$Species=as.numeric(iris$Species) >>>>>> inTrain1=createDataPartition(y=iris$Species, >>>>>> p=.75, >>>>>> list=FALSE) >>>>>> >>>>>> training1=iris[inTrain1,] >>>>>> testing1=iris[-inTrain1,] >>>>>> >>>>>> ctrl1=trainControl(method="cv", >>>>>> number=10) >>>>>> >>>>>> plsFit2=train(Species~., >>>>>> data=training1, >>>>>> method="pls", >>>>>> trControl=ctrl1, >>>>>> metric="Rsquared", >>>>>> preProc=c("scale")) >>>>>> >>>>>> data(iris) >>>>>> training1=iris[inTrain1,] >>>>>> datvars=training1[,1:4] >>>>>> dat.sc=scale(datvars) >>>>>> >>>>>> pls.dat=plsr(as.numeric(training1$Species)~dat.sc, >>>>>> ncomp=3, method="oscorespls", data=training1) >>>>>> >>>>>> x=crossval(pls.dat, segments=10) >>>>>> >>>>>> summary(x) >>>>>> summary(plsFit2) >>>>>> >>>>>> Regards, >>>>>> Charles >>>>>> >>>>>> On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr <deter...@umn.edu >>>>>> >wrote: >>>>>> >>>>>> > Greetings, >>>>>> > >>>>>> > I have been exploring the use of the caret package to conduct some >>>>>> plsda >>>>>> > modeling. Previously, I have come across methods that result in a >>>>>> R2 and >>>>>> > Q2 for the model. Using the 'iris' data set, I wanted to see if I >>>>>> could >>>>>> > accomplish this with the caret package. I use the following code: >>>>>> > >>>>>> > library(caret) >>>>>> > data(iris) >>>>>> > >>>>>> > #needed to convert to numeric in order to do regression >>>>>> > #I don't fully understand this but if I left as a factor I would >>>>>> get an >>>>>> > error following the summary function >>>>>> > iris$Species=as.numeric(iris$Species) >>>>>> > inTrain1=createDataPartition(y=iris$Species, >>>>>> > p=.75, >>>>>> > list=FALSE) >>>>>> > >>>>>> > training1=iris[inTrain1,] >>>>>> > testing1=iris[-inTrain1,] >>>>>> > >>>>>> > ctrl1=trainControl(method="cv", >>>>>> > number=10) >>>>>> > >>>>>> > plsFit2=train(Species~., >>>>>> > data=training1, >>>>>> > method="pls", >>>>>> > trControl=ctrl1, >>>>>> > metric="Rsquared", >>>>>> > preProc=c("scale")) >>>>>> > >>>>>> > data(iris) >>>>>> > training1=iris[inTrain1,] >>>>>> > datvars=training1[,1:4] >>>>>> > dat.sc=scale(datvars) >>>>>> > >>>>>> > n=nrow(dat.sc) >>>>>> > dat.indices=seq(1,n) >>>>>> > >>>>>> > timematrix=with(training1, >>>>>> > classvec2classmat(Species[dat.indices])) >>>>>> > >>>>>> > pls.dat=plsr(timematrix ~ dat.sc, >>>>>> > ncomp=3, method="oscorespls", data=training1) >>>>>> > >>>>>> > x=crossval(pls.dat, segments=10) >>>>>> > >>>>>> > summary(x) >>>>>> > summary(plsFit2) >>>>>> > >>>>>> > I see two different R2 values and I cannot figure out how to get >>>>>> the Q2 >>>>>> > value. Any insight as to what my errors may be would be >>>>>> appreciated. >>>>>> > >>>>>> > Regards, >>>>>> > >>>>>> > -- >>>>>> > Charles >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Charles Determan >>>>>> Integrated Biosciences PhD Student >>>>>> University of Minnesota >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> ______________________________________________ >>>>>> R-help@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Max >>>>> >>>> >>>> >>>> >>>> -- >>>> Charles Determan >>>> Integrated Biosciences PhD Student >>>> University of Minnesota >>>> >>> >>> >>> >>> -- >>> >>> Max >>> >> >> >> >> -- >> Charles Determan >> Integrated Biosciences PhD Student >> University of Minnesota >> > > > > -- > > Max > -- Charles Determan Integrated Biosciences PhD Student University of Minnesota [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.