Seems to me if splitting once for all the bias will be big and if splitting 
once for each choice of parameters the variance ill be big.  

In LibSVM, for each choice of (c, gamma),  the searching script grid.py calls 
svm_cross_validation() which has a random split of the dataset. So seems to me 
it is the second method. 

As to the first one, I come to it in Ch 7 Section 10 of "The Elements of 
Statistical Learning" by Hastie where it says first split the dataset, then 
evaluate validation error CV(alpha) and vary the complexity parameter value 
alpha to find the one giving smallest validation error. It appears to me the 
splitting is once for all choices of the complexity parameter.

Thanks!

--- On Sun, 7/12/09, Tim <timlee...@yahoo.com> wrote:

> From: Tim <timlee...@yahoo.com>
> Subject: [R] Splitting dataset for Tuning Parameter with Cross Validation
> To: r-h...@stat.math.ethz.ch
> Date: Sunday, July 12, 2009, 6:58 PM
> 
> Hi,
> My question might be a little general.
> 
> I have a number of values to select for the complexity
> parameters in some classifier, e.g. the C and gamma in SVM
> with RBF kernel. The selection is based on which values give
> the smallest cross validation error.
> 
> I wonder if the randomized splitting of the available
> dataset into folds is done only once for all those choices
> for the parameter values, or once for each choice? And why?
> 
> Thanks and regards!
> 
> ______________________________________________
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to