Re: [R] question about SVM in e1071

Pau Carrio Gaspar Thu, 15 Jul 2010 01:19:03 -0700

Hi Jack,

to 1) and 2) there are telling you the same. I recommend you to read the
first sections of the article it is very well writen and clear. There you
will read about duality.


to 3) I interpret the scatter plot so: * Increasing the value of C (...)
forces the creation of a more accurate model.* A more accurate model is done
my adding more SV ( till we get a convex hull of the data )

hope it helps
Regards
Pau

2010/7/14 Jack Luo <jluo.rh...@gmail.com>

> Pau,
>
> Thanks a lot for your email, I found it very helpful. Please see below for
> my reply, thanks.
>
> -Jack
>
> On Wed, Jul 14, 2010 at 10:36 AM, Pau Carrio Gaspar 
> <paucar...@gmail.com>wrote:
>
>>  Hello Jack,
>>
>> 1 ) why do you thought that " larger C is prone to overfitting than
>> smaller C" ?
>>
>
>   *There is some statement in the link http://www.dtreg.com/svm.htm
>
> "To allow some flexibility in separating the categories, SVM models have a
> cost parameter, C, that controls the trade off between allowing training
> errors and forcing rigid margins. It   creates a soft margin that permits
> some misclassifications. Increasing the value of C increases the cost of
> misclassifying points and forces the creation of a more accurate model that
> may not generalize well."
>
> My understanding is that this means larger C may not generalize well (prone
> to overfitting).
> *
>
> 2 ) if you look at the formulation of the quadratic program problem you
> will see that  C rules the error of the "cutting plane " ( and overfitting
> ). Therfore for hight C you allow that the "cutting plane" cuts worse the
> set, so SVM needs less points to build it. a proper explanation is in
> Kristin P. Bennett and Colin Campbell, "Support Vector Machines: Hype or
> Hallelujah?", SIGKDD Explorations, 2,2, 2000, 1-13.
> http://www.idi.ntnu.no/emner/it3704/lectures/papers/Bennett_2000_Support.pdf
>
> *Could you be more specific about this? I don't quite understand. *
>
>>
>> 3) you might find usefull this plots:
>>
>> library(e1071)
>> m1 <- matrix( c(
>> 0,    0,    0,    1,    1,    2,     1, 2,    3,    2,    3, 3, 0,
>> 1,2,3,    0, 1, 2, 3,
>> 1,    2,    3,    2,    3,    3,     0, 0,    0,    1, 1, 2, 4, 4,4,4,
>> 0, 1, 2, 3,
>> 1,    1,    1,    1,    1,    1,    -1,-1,  -1,-1,-1,-1, 1 ,1,1,1,     1,
>> 1,-1,-1
>> ), ncol = 3 )
>>
>> Y = m1[,3]
>> X = m1[,1:2]
>>
>> df = data.frame( X , Y )
>>
>> par(mfcol=c(4,2))
>> for( cost in c( 1e-3 ,1e-2 ,1e-1, 1e0,  1e+1, 1e+2 ,1e+3)) {
>> #cost <- 1
>> model.svm <- svm( Y ~ . , data = df ,  type = "C-classification" , kernel
>> = "linear", cost = cost,
>>                          scale =FALSE )
>> #print(model.svm$SV)
>>
>> plot(x=0,ylim=c(0,5), xlim=c(0,3),main= paste( "cost: ",cost, "#SV: ",
>> nrow(model.svm$SV) ))
>> points(m1[m1[,3]>0,1], m1[m1[,3]>0,2], pch=3, col="green")
>> points(m1[m1[,3]<0,1], m1[m1[,3]<0,2], pch=4, col="blue")
>> points(model.svm$SV[,1],model.svm$SV[,2], pch=18 , col = "red")
>> }
>> *
>> *
>
> *Thanks a lot for the code, I really appreciate it. I've run it, but I am
> not sure how should I interpret the scatter plot, although it is obvious
> that number of SVs decreases with cost increasing. *
>
>>
>> Regards
>> Pau
>>
>>
>> 2010/7/14 Jack Luo <jluo.rh...@gmail.com>
>>
>>> Hi,
>>>
>>> I have a question about the parameter C (cost) in svm function in e1071.
>>> I
>>> thought larger C is prone to overfitting than smaller C, and hence leads
>>> to
>>> more support vectors. However, using the Wisconsin breast cancer example
>>> on
>>> the link:
>>> http://planatscher.net/svmtut/svmtut.html
>>> I found that the largest cost have fewest support vectors, which is
>>> contrary
>>> to what I think. please see the scripts below:
>>> Am I misunderstanding something here?
>>>
>>> Thanks a bunch,
>>>
>>> -Jack
>>>
>>> > model1 <- svm(databctrain, classesbctrain, kernel = "linear", cost =
>>> 0.01)
>>> > model2 <- svm(databctrain, classesbctrain, kernel = "linear", cost = 1)
>>> > model3 <- svm(databctrain, classesbctrain, kernel = "linear", cost =
>>> 100)
>>> > model1
>>>
>>> Call:
>>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear",
>>>    cost = 0.01)
>>>
>>>
>>> Parameters:
>>>   SVM-Type:  C-classification
>>>  SVM-Kernel:  linear
>>>       cost:  0.01
>>>      gamma:  0.1111111
>>>
>>> Number of Support Vectors:  99
>>>
>>> > model2
>>>
>>> Call:
>>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear",
>>>    cost = 1)
>>>
>>>
>>> Parameters:
>>>   SVM-Type:  C-classification
>>>  SVM-Kernel:  linear
>>>       cost:  1
>>>      gamma:  0.1111111
>>>
>>> Number of Support Vectors:  46
>>>
>>> > model3
>>>
>>> Call:
>>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear",
>>>    cost = 100)
>>>
>>>
>>> Parameters:
>>>   SVM-Type:  C-classification
>>>  SVM-Kernel:  linear
>>>       cost:  100
>>>      gamma:  0.1111111
>>>
>>> Number of Support Vectors:  44
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about SVM in e1071

Reply via email to