Howdy,

On Oct 21, 2009, at 1:05 PM, Anders Carlsson wrote:
<snip>
Yes, exactly that. In your example, though, the variation seems to be a lot smaller. I'm guessing that has to with the data.

If I instead output the decision values, the whole procedure is fully reproducible, i.e. the exact same values are returned when I retrain the model.

By the decision values, you mean the predict labels, right?

I have no idea how the probabilities are calculated, but it seems to be in this step that the differences arise. In my case, I feel a bit hesitant to use them when they differ that much between runs (15% or so)...

I'd find that a bit disconcerting, too. Can you give a sample of your data + code your using that can reproduce this example?

Warning: Brainstorming Below

If I were to calculate probabilities for my class labels, I'd make the probability some function of the example's distance from the decision boundary.

Now, if your decision boundary isn't changing from run to run (and I guess it really shouldn't be, since the SVM returns the maximum margin classifier (which is, by definition, unique, right?)), it's hard to imagine why these probabilities would change, either ...

... unless you're holding out different subsets of your data during training, or perhaps have a different value for your penalty (cost) parameter when building the model. I believe you said that you're actually training the same exact model each time, though, right?

Anyway, I see the help page for ?svm says this, if it helps:

"The probability model for classification fits a logistic distribution using maximum likelihood to the decision values of all binary classifiers, and computes the a-posteriori class probabilities for the multi-class problem using quadratic optimization"

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to