Hi, On Thu, Jun 24, 2010 at 1:22 PM, Changbin Du <changb...@gmail.com> wrote: > HI, GUYS, > > I used the following codes to run SVM and get prediction on new data set hh. > > dim(all_h) > [1] 2034 24 > dim(hh) # it contains all the variables besides the variables in all_h > data set. > [1] 640 415
If I understand you correctly, this is wrong. You are supposed to hold out *observations* (rows) when doing training/testing, not variables/predictors/features (cols). Let's assume that e1071::svm doesn't do anything fancy with matching column names between training/testing, then to put this simply: the number of columns (features per observation) you are using in training should be the same number of columns you have in your test set. -steve > require(e1071) > > svm.tune<-tune(svm, as.factor(out) ~ ., data=all_h, > ranges=list(gamma=2^(-5:5), cost=2^(-5:5)))# find the best parameters. > > bestg<-svm.tune$best.parameters[[1]] > bestc<-svm.tune$best.parameters[[2]] > > svm.fit<-svm(as.factor(out) ~ ., data=all_h, method="C-classification", > kernel="radial", probability = TRUE, cost=bestc, gamma=bestg, cross=10) # > model fitting > > svm.pred<-predict(svm.fit, hh, decision.values = TRUE, probability = TRUE) # > find the probability. > * > Error in matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames = > list(rowns, : > invalid 'ncol' value (too large or NA)* > > >> head(all_h) > DD HK HQ IL LP NE NP > TA TP WA WC > 1 0.00543 0 0 0.00815 0.00272 0.00543 0.00000 0.00000 0.00000 0.00000 0 > 3 0.00000 0 0 0.00890 0.00890 0.00712 0.00534 0.00000 0.00890 0.00178 0 > 4 0.00448 0 0 0.00448 0.00299 0.00448 0.00149 0.00299 0.00000 0.00149 0 > 5 0.00312 0 0 0.00467 0.00467 0.00000 0.00156 0.00467 0.00312 0.00467 0 > 6 0.00587 0 0 0.02053 0.00587 0.00000 0.00293 0.00587 0.00293 0.00000 0 > 7 0.00000 0 0 0.02422 0.00346 0.00000 0.00346 0.00346 0.00000 0.00346 0 > WD WG WN YW acid_per > base_per charge_per > 1 0.00000 0.00000 0.00000 0.00000 0.14402174 0.12228261 0.019021739 > 3 0.00178 0.00178 0.00534 0.00178 0.12277580 0.09252669 0.016014235 > 4 0.00149 0.00448 0.00448 0.00000 0.16591928 0.11509716 0.022421525 > 5 0.00000 0.00156 0.00000 0.00156 0.13084112 0.10903427 0.009345794 > 6 0.00293 0.00000 0.00000 0.00000 0.07038123 0.08797654 0.002932551 > 7 0.00000 0.00346 0.00000 0.00346 0.05536332 0.08650519 0.010380623 > hydrophob_per polar_per num_cell num_genes position out > 1 0.3804348 0.1929348 1 4 1 0 > 3 0.3540925 0.2508897 1 4 3 0 > 4 0.3393124 0.2032885 1 4 4 1 > 5 0.3753894 0.2305296 2 7 1 0 > 6 0.4868035 0.1964809 2 7 2 0 > 7 0.4878893 0.1522491 2 7 3 0 > >> quantile(hh$HK) > 0% 25% 50% 75% 100% > 0.00000 0.00000 0.00000 0.00000 0.02703 >> quantile(hh$HQ) > 0% 25% 50% 75% 100% > 0.000 0.000 0.000 0.000 0.025 >> quantile(hh$WC) > 0% 25% 50% 75% 100% > 0.00000 0.00000 0.00000 0.00000 0.01266 > > Can someone give some suggestions? > > Thanks! > > > > > > -- > Sincerely, > Changbin > -- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.