Re: [R] caret pls model statistics

Charles Determan Jr Tue, 05 Mar 2013 06:15:20 -0800

Does anyone know of any literature on the kappa statistic with plsda?  I
have been trying to find papers that used plsda for classification and have
yet to come across this kappa value.  All the papers I come across
typically have R2 as an indicator of model fit.  I want to make sure I
conduct such analysis appropriately, any guidance is appreciated.


Regards,
Charles

On Sun, Mar 3, 2013 at 4:38 PM, Max Kuhn <mxk...@gmail.com> wrote:

> That the most common formula, but not the only one. See
>
>   Kvålseth, T. (1985). Cautionary note about $R^2$. *American Statistician
> *, *39*(4), 279285.
>
> Traditionally, the symbol 'R' is used for the Pearson correlation
> coefficient and one way to calculate R^2 is... R^2.
>
> Max
>
>
> On Sun, Mar 3, 2013 at 3:16 PM, Charles Determan Jr <deter...@umn.edu>wrote:
>
>> I was under the impression that in PLS analysis, R2 was calculated by 1-
>> (Residual sum of squares) / (Sum of squares).  Is this still what you are
>> referring to?  I am aware of the linear R2 which is how well two variables
>> are correlated but the prior equation seems different to me.  Could you
>> explain if this is the same concept?
>>
>> Charles
>>
>>
>> On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn <mxk...@gmail.com> wrote:
>>
>>> > Is there some literature that you make that statement?
>>>
>>> No, but there isn't literature on changing a lightbulb with a duck
>>> either.
>>>
>>> > Are these papers incorrect in using these statistics?
>>>
>>> Definitely, if they convert 3+ categories to integers (but there are
>>> specialized R^2 metrics for binary classification models). Otherwise, they
>>> are just using an ill-suited "score".
>>>
>>>  How would you explain such an R^2 value to someone? R^2 is
>>> a function of correlation between the two random variables. For two
>>> classes, one of them is binary. What does it mean?
>>>
>>> Historically, models rooted in computer science (eg neural networks)
>>> used RMSE or SSE to fit models with binary outcomes and that *can* work
>>> work well.
>>>
>>> However, I don't think that communicating R^2 is effective. Other
>>> metrics (e.g. accuracy, Kappa, area under the ROC curve, etc) are designed
>>> to measure the ability of a model to classify and work well. With 3+
>>> categories, I tend to use Kappa.
>>>
>>> Max
>>>
>>>
>>>
>>>
>>> On Sun, Mar 3, 2013 at 10:53 AM, Charles Determan Jr 
>>> <deter...@umn.edu>wrote:
>>>
>>>> Thank you for your response Max.  Is there some literature that you
>>>> make that statement?  I am confused as I have seen many publications that
>>>> contain R^2 and Q^2 following PLSDA analysis.  The analysis usually is to
>>>> discriminate groups (ie. classification).  Are these papers incorrect in
>>>> using these statistics?
>>>>
>>>> Regards,
>>>> Charles
>>>>
>>>>
>>>> On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn <mxk...@gmail.com> wrote:
>>>>
>>>>> Charles,
>>>>>
>>>>> You should not be treating the classes as numeric (is virginica
>>>>> really three times setosa?). Q^2 and/or R^2 are not appropriate for
>>>>> classification.
>>>>>
>>>>> Max
>>>>>
>>>>>
>>>>> On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr 
>>>>> <deter...@umn.edu>wrote:
>>>>>
>>>>>> I have discovered on of my errors.  The timematrix was unnecessary
>>>>>> and an
>>>>>> unfortunate habit I brought from another package.  The following
>>>>>> provides
>>>>>> the same R2 values as it should, however, I still don't know how to
>>>>>> retrieve Q2 values.  Any insight would again be appreciated:
>>>>>>
>>>>>> library(caret)
>>>>>> library(pls)
>>>>>>
>>>>>> data(iris)
>>>>>>
>>>>>> #needed to convert to numeric in order to do regression
>>>>>> #I don't fully understand this but if I left as a factor I would get
>>>>>> an
>>>>>> error following the summary function
>>>>>> iris$Species=as.numeric(iris$Species)
>>>>>> inTrain1=createDataPartition(y=iris$Species,
>>>>>>     p=.75,
>>>>>>     list=FALSE)
>>>>>>
>>>>>> training1=iris[inTrain1,]
>>>>>> testing1=iris[-inTrain1,]
>>>>>>
>>>>>> ctrl1=trainControl(method="cv",
>>>>>>     number=10)
>>>>>>
>>>>>> plsFit2=train(Species~.,
>>>>>>     data=training1,
>>>>>>     method="pls",
>>>>>>     trControl=ctrl1,
>>>>>>     metric="Rsquared",
>>>>>>     preProc=c("scale"))
>>>>>>
>>>>>> data(iris)
>>>>>> training1=iris[inTrain1,]
>>>>>> datvars=training1[,1:4]
>>>>>> dat.sc=scale(datvars)
>>>>>>
>>>>>> pls.dat=plsr(as.numeric(training1$Species)~dat.sc,
>>>>>>     ncomp=3, method="oscorespls", data=training1)
>>>>>>
>>>>>> x=crossval(pls.dat, segments=10)
>>>>>>
>>>>>> summary(x)
>>>>>> summary(plsFit2)
>>>>>>
>>>>>> Regards,
>>>>>> Charles
>>>>>>
>>>>>> On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr <deter...@umn.edu
>>>>>> >wrote:
>>>>>>
>>>>>> > Greetings,
>>>>>> >
>>>>>> > I have been exploring the use of the caret package to conduct some
>>>>>> plsda
>>>>>> > modeling.  Previously, I have come across methods that result in a
>>>>>> R2 and
>>>>>> > Q2 for the model.  Using the 'iris' data set, I wanted to see if I
>>>>>> could
>>>>>> > accomplish this with the caret package.  I use the following code:
>>>>>> >
>>>>>> > library(caret)
>>>>>> > data(iris)
>>>>>> >
>>>>>> > #needed to convert to numeric in order to do regression
>>>>>> > #I don't fully understand this but if I left as a factor I would
>>>>>> get an
>>>>>> > error following the summary function
>>>>>> > iris$Species=as.numeric(iris$Species)
>>>>>> > inTrain1=createDataPartition(y=iris$Species,
>>>>>> >     p=.75,
>>>>>> >     list=FALSE)
>>>>>> >
>>>>>> > training1=iris[inTrain1,]
>>>>>> > testing1=iris[-inTrain1,]
>>>>>> >
>>>>>> > ctrl1=trainControl(method="cv",
>>>>>> >     number=10)
>>>>>> >
>>>>>> > plsFit2=train(Species~.,
>>>>>> >     data=training1,
>>>>>> >     method="pls",
>>>>>> >     trControl=ctrl1,
>>>>>> >     metric="Rsquared",
>>>>>> >     preProc=c("scale"))
>>>>>> >
>>>>>> > data(iris)
>>>>>> > training1=iris[inTrain1,]
>>>>>> > datvars=training1[,1:4]
>>>>>> > dat.sc=scale(datvars)
>>>>>> >
>>>>>> > n=nrow(dat.sc)
>>>>>> > dat.indices=seq(1,n)
>>>>>> >
>>>>>> > timematrix=with(training1,
>>>>>> >         classvec2classmat(Species[dat.indices]))
>>>>>> >
>>>>>> > pls.dat=plsr(timematrix ~ dat.sc,
>>>>>> >     ncomp=3, method="oscorespls", data=training1)
>>>>>> >
>>>>>> > x=crossval(pls.dat, segments=10)
>>>>>> >
>>>>>> > summary(x)
>>>>>> > summary(plsFit2)
>>>>>> >
>>>>>> > I see two different R2 values and I cannot figure out how to get
>>>>>> the Q2
>>>>>> > value.  Any insight as to what my errors may be would be
>>>>>> appreciated.
>>>>>> >
>>>>>> > Regards,
>>>>>> >
>>>>>> > --
>>>>>> > Charles
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Charles Determan
>>>>>> Integrated Biosciences PhD Student
>>>>>> University of Minnesota
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help@r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Max
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Charles Determan
>>>> Integrated Biosciences PhD Student
>>>> University of Minnesota
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Max
>>>
>>
>>
>>
>> --
>> Charles Determan
>> Integrated Biosciences PhD Student
>> University of Minnesota
>>
>
>
>
> --
>
> Max
>



-- 
Charles Determan
Integrated Biosciences PhD Student
University of Minnesota

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] caret pls model statistics

Reply via email to