On Thu, Aug 19, 2010 at 10:56 AM, Watling,James I <watli...@ufl.edu> wrote:
> Hi Steve--
>
> Thanks for your interest in helping me figure this out.  I think the problem 
> has to do with the values of the probabilities returned from the use of the 
> model to predict occurrence in a new dataframe.

Ok, so if you're sure this is the problem, and not, say, getting the
correct values for the predictor variables at a given point, then I'd
be a bit more thorough when building your model.

Originally you said:

> I have used a training dataset to train the model, and tested it against a 
> validation data set with good results: AUC is high, and the confusion matrix 
> indicates low commission and omission errors.

Maybe your originally "good" AUC's was just a function of your train/test split?

Why not use all of your data and do something like 10 fold cross
validation to find:

(1) Your average accuracy over your folds
(2) The best value for your cost parameter; (how did you pick cost=10000)?
(3) or even the best kernel to use.

Doing 2 and 3 will likely be time consuming. To help with (2) you
might try looking at the svmpath package:

http://cran.r-project.org/web/packages/svmpath/index.html

It only works on 2-class classification problems, and (I think) using
a linear kernel (sorry, don't remember off hand, but it's written in
the package help and linked pubs).

You don't need to use svmpath, but then you'll need to define a "grid"
of C values (or maybe a 2d grid, if your svm + kernel combo has more
params) and train over these values ... takes lots of cpu time, but
not too much human time.

Does that make sense?

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to