Hi, I seem to have made some headway on this problem but its still not solved. It seems like this is a "factor" issue. When I read my training set, I read it with read.csv() which converts each of the columns as "factors". From this if I take a single row as my testSeq, it works great. On the other hand, when I read in my test sequence from a Fasta file, I am using the "seqinr" package's function "readFasta()" or if read a sequence directly from a file I am using "scan()": eg:
train500 = read.csv("toClassify500_1.csv",header=TRUE) # reading the training set modelforSVM <- ksvm(Class ~ ., data = train500, kernel = "rbfdot", kpar = "automatic", C = 60, cross = 3, prob.model = TRUE) Now if I do: tindex =sample(1:dim(train500)[1], 1) testSeq=train500[tindex,] predict(modelforSVM, testSeq); It works great. BUT if I do: my.file=file("chr4_seqs.fasta", open="r") chr4Seq = scan(my.file,list("",""),nlines=2) # read the data from a fasta file using scan() seqId = chr4Seq[[1]]; testSeq = as.data.frame(t(s2c(toupper(chr4Seq[[2]])))) # the s2c function just converts the "STRING" to char vector "S" "T" "R" "I" "N" "G" predict(modelforSVM, testSeq); Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") : contrasts can be applied only to factors with 2 or more levels ------------------------- If I apply factor() to testSeq, it still doesn't work : eg: testSeq=data.frame(lapply(testSeq,factor)) I still get Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") : contrasts can be applied only to factors with 2 or more levels Another thing I tried was reading the fasta file using the readFasta() function and taking a sample input from the training set itself: data500_1_fasta = read.fasta("toClassify500.fasta") # read a fasta file via the seqinr package data500_1_seq = t(getSequence(data500_1_fasta)) # get the sequences from it, 256 sequences, first 128 are +, next 128 are - data500_1_df = as.data.frame(data500_1_seq) #make a data frame from it class = append(rep("+",times=128),rep("-",times=128)) # add the class column to it data500_1_df = cbind(Class=class,data500_1_df) data500_1_df = data.frame(lapply(data500_1_df,factor)) #finally apply the factor() on the data frame #Now train and get the model modelforSVM <- ksvm(Class ~ ., data = data500_1_df, kernel = "rbfdot", kpar = "automatic", C = 60, cross = 3, prob.model = TRUE) and finally: tindex =sample(1:dim(data500_1_df)[1], 1) testSeq=data500_1_df[tindex,] predict(modelforSVM, testSeq); Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") : contrasts can be applied only to factors with 2 or more levels I am very confused at this point. What am I doing wrong? How do I use the factor() function properly so that I don't get this error? Am I in the right direction at all? Thanks in anticipation of your help. -vishal [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.