Hi All, Thank you for your replies so far. I was hoping I could get some more input from you on this issue. It seems to me that I have hit a dead end here and would really appreciate some feedback. I have followed all the suggestions you have mentioned but they still this is stuck. Earlier I thought that it was a "factor" issue but now even that is not the error. Here is the script and the error. Thanks for your help. I have attached the sample test file as well as the training file in case you would like to run it locally. --------------------------------- library(seqinr) library("kernlab")
### Reading in the data mars500_1_fasta = read.fasta("toClassify500_1.fasta") mars500_1_seq = t(getSequence(mars500_1_fasta)) # get the sequences from the fasta object mars500_1_df = as.data.frame(mars500_1_seq,stringsAsFactors=FALSE) # convert it to a Data Frame class = append(rep("+",times=128),rep("-",times=128)) # add the Class field to the data frame for classification mars500_1_df = cbind(Class=class,mars500_1_df) mars500_1_df = data.frame(lapply(mars500_1_df,factor)) #Finally apply the factor() function ##### ##### Call the ksvm() function to create a model mars500_1 <- ksvm(Class ~ ., data = mars500_1_df, kernel = "rbfdot", kpar = "automatic", C = 60, cross = 3, prob.model = TRUE) testSeq_fa=read.fasta("temp1.fasta") testSeq_seq=t(getSequence(testSeq_fa)) testSeq_df=as.data.frame(testSeq_seq,stringsAsFactors=FALSE) testSeq_df = cbind(Class="-",testSeq_df) testSeq_df = data.frame(lapply(testSeq_df,factor)) predict(mars500_1,testSeq_df) Error in .local(object, ...) : test vector does not match model ! Thanks in advance. Sincerely, Vishal On Fri, Dec 25, 2009 at 8:10 AM, Vishal Thapar <vishaltha...@gmail.com>wrote: > Hi, > > I seem to have made some headway on this problem but its still not solved. > It seems like this is a "factor" issue. When I read my training set, I read > it with read.csv() which converts each of the columns as "factors". From > this if I take a single row as my testSeq, it works great. On the other > hand, when I read in my test sequence from a Fasta file, I am using the > "seqinr" package's function "readFasta()" or if read a sequence directly > from a file I am using "scan()": eg: > > train500 = read.csv("toClassify500_1.csv",header=TRUE) # reading the > training set > modelforSVM <- ksvm(Class ~ ., data = train500, kernel = "rbfdot", kpar = > "automatic", C = 60, cross = 3, prob.model = TRUE) > Now if I do: > tindex =sample(1:dim(train500)[1], 1) > testSeq=train500[tindex,] > predict(modelforSVM, testSeq); > It works great. > > BUT if I do: > > my.file=file("chr4_seqs.fasta", open="r") > chr4Seq = scan(my.file,list("",""),nlines=2) # read the data from a fasta > file using scan() > > seqId = chr4Seq[[1]]; > testSeq = as.data.frame(t(s2c(toupper(chr4Seq[[2]])))) > # the s2c function just converts the "STRING" to char vector "S" "T" "R" > "I" "N" "G" > > > predict(modelforSVM, testSeq); > Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") : contrasts > can be applied only to factors with 2 or more levels > ------------------------- > If I apply factor() to testSeq, it still doesn't work : eg: > > testSeq=data.frame(lapply(testSeq,factor)) > I still get Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") : > contrasts can be applied only to factors with 2 or more levels > > Another thing I tried was reading the fasta file using the readFasta() > function and taking a sample input from the training set itself: > > data500_1_fasta = read.fasta("toClassify500.fasta") # read a fasta file via > the seqinr package > data500_1_seq = t(getSequence(data500_1_fasta)) # get the sequences from > it, 256 sequences, first 128 are +, next 128 are - > data500_1_df = as.data.frame(data500_1_seq) #make a data frame from it > class = append(rep("+",times=128),rep("-",times=128)) # add the class > column to it > data500_1_df = cbind(Class=class,data500_1_df) > data500_1_df = data.frame(lapply(data500_1_df,factor)) #finally apply the > factor() on the data frame > > #Now train and get the model > > modelforSVM <- ksvm(Class ~ ., data = data500_1_df, kernel = "rbfdot", kpar > = "automatic", C = 60, cross = 3, prob.model = TRUE) > > and finally: > tindex =sample(1:dim(data500_1_df)[1], 1) > testSeq=data500_1_df[tindex,] > > predict(modelforSVM, testSeq); > > Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") : contrasts > can be applied only to factors with 2 or more levels > > I am very confused at this point. What am I doing wrong? How do I use the > factor() function properly so that I don't get this error? Am I in the right > direction at all? > > Thanks in anticipation of your help. > > -vishal > > > > -- Vishal Thapar, Ph.D. Post Doctoral Researcher Cold Spring Harbor Lab Williams Bldg 1 Bungtown Road Cold Spring Harbor, NY - 11724
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.