Hi, On Fri, Jan 7, 2011 at 2:10 AM, Noah Silverman <n...@smartmediacorp.com> wrote: > I have a data set with about 30,000 training cases and 103 variable. > > I've trained an SVM (using the e1071 package) for a binary classifier {0,1}. > The accuracy isn't great. > > I used a grid search over the C and G parameters with an RBF kernel to find > the best settings. > > I remember that for least squares, R has a nice stepwise function that will > try combining subsets of variables to find the optimal result. Clearly, > this doesn't exist for SVMs as a built in function. > > As an experiment, I simply grabbed the first 50 variables and repeated the > training/grid search procedure. The results were significantly better. > Since the date is VERY noisy, my guess is that eliminating some of the > variables eliminated some noise that resulted in better results. > > With a grid of 100 parameter settings (10 for C, 10 for G) and 106 > variables, trying every combination would be prohibitively time consuming. > > Can anyone suggest an approach to seek the ideal subset of variables for my > SVM classifier?
Sounds like a job for the types of approaches found in the penalizedSVM package: http://cran.r-project.org/web/packages/penalizedSVM/index.html -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.