Donald Catanzaro, PhD wrote:
Hi All,
I have done more background research (including Frank's book) so I feel
that my second question is answered. However, as a novice R user I
still have the following problem, accessing the output of predict. So
simplifying my question, using the example provided in the Design
package
(http://lib.stat.cmu.edu/S/Harrell/help/Design/html/predict.lrm.html) I
might do something like:
# See help for predict.Design for several binary logistic
# regression examples
# Examples of predictions from ordinal models
set.seed(1)
y <- factor(sample(1:3, 400, TRUE), 1:3, c('good','better','best'))
x1 <- runif(400)
x2 <- runif(400)
f <- lrm(y ~ rcs(x1,4)*x2)
predict(f, type="fitted.ind")[1:10,] #gets Prob(better) and all others
y=good y=better y=best
1 0.3124704 0.3631544 0.3243752
2 0.3676075 0.3594685 0.2729240
3 0.2198274 0.3437416 0.4364309
4 0.3063463 0.3629658 0.3306879
5 0.5171323 0.3136088 0.1692590
6 0.3050115 0.3629071 0.3320813
7 0.3532452 0.3612928 0.2854620
8 0.2933928 0.3621220 0.3444852
9 0.3068595 0.3629867 0.3301538
10 0.6214710 0.2612164 0.1173126
d <- data.frame(x1=.5,x2=.5)
predict(f, d, type="fitted") # Prob(Y>=j) for new observation
y>=better y>=best 0.6906593 0.3275849
predict(f, d, type="fitted.ind") # Prob(Y=j)
y=good y=better y=best 0.3093407 0.3630744 0.3275849
So now if I wanted to do
out <- predict(f, d, type="fitted.ind")>
out
y=good y=better y=best
0.3093407 0.3630744 0.3275849
out$"y=better"
Error in out$"y=better" : $ operator is invalid for atomic vectors
Recognize what type of object out is, and address its elements
accordingly. E.g. out[,'y=best'].
Frank
y=better is the max, so how do I create something that says that ?
(which is not exactly what I want to do but close enough to help me
figure out what R code I need to accomplish the task)
I can push the predictions out to a vector:
out.vector <- as.vector(predict(f, d, type="fitted.ind"))
out.vector
[1] 0.3093407 0.3630744 0.3275849
which gets me part of the way because I can find out max(out.vector) but
I still need to know what column the max is in. I think the problem is
that I don't know how to manipulate data frames and vectors in R and
need some guidance
-Don
Don Catanzaro, PhD Landscape Ecologist
dgcatanz...@gmail.com
16144 Sigmond Lane
Lowell, AR 72745
479-751-3616
Frank E Harrell Jr wrote:
Donald Catanzaro, PhD wrote:
Hi All,
I am working on some ordinal logistic regresssions using LRM in the
Design package. My response variable has three categories (1,2,3)
and after using the creating my model and using a call to predict
some values and I wanted to use a simple .5 cut-off to classify my
probabilities into the categories.
I had two questions:
a) first, I am having trouble directly accessing the probabilities
which may have more to do with my lack of experience with R
For instance, my calls
>ologit.three.NoPerFor <- lrm(Threshold.Three ~ TECI , data=CLD,
na.action=na.pass)
>CLD$Threshold.Predict.Three.NoPerFor<-
predict(ologit.three.NoPerFor, newdata=CLD, type="fitted.ind")
>CLD$Threshold.Predict.Three.NoPerFor.Cats[CLD$Threshold.Predict.Three.NoPerFor.Threshold.Three=1
> .5] <- 1
Error: unexpected '=' in
"CLD$Threshold.Predict.Three.NoPerFor.Cats[CLD$Threshold.Predict.Three.NoPerFor.Threshold.Three="
>
>
produce an error message and it seems as R does not like the equal
sign at all. So how does one access the probabilities so I can
classify them into the categories of 1,2,3 so I can look at
performance of my model ?
use == to check equality
b) which leads me to my next question. I thought that simply
calculating the percent correct off of my predictions would be
sufficient to look at performance but since my question is very much
in line with this thread
http://tolstoy.newcastle.edu.au/R/e4/help/08/04/8987.html I am not so
sure anymore. I am afraid I did not understand Frank Harrell's last
suggestion regarding improper scoring rule - can someone point me to
some internet resources that I might be able to review to see why my
approach would not be valid ?
Percent correct will give you misleading answers and is game-able. It
is also ultra-high-variance. Though not a truly proper scoring rule,
Somers' Dxy rank correlation (generalization of ROC area) is helpful.
Better still: use the log-likelihood and related quantities (deviance,
adequacy index as described in my book).
Frank
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.