Hi,
I seem to have made some headway on this problem but its still not solved.
It seems like this is a "factor" issue. When I read my training set, I read
it with read.csv() which converts each of the columns as "factors". From
this if I take a single row as my testSeq, it works great. On the other
hand, when I read in my test sequence from a Fasta file, I am using the
"seqinr" package's function "readFasta()" or if read a sequence directly
from a file I am using "scan()": eg:
train500 = read.csv("toClassify500_1.csv",header=TRUE) # reading the
training set
modelforSVM <- ksvm(Class ~ ., data = train500, kernel = "rbfdot", kpar =
"automatic", C = 60, cross = 3, prob.model = TRUE)
Now if I do:
tindex =sample(1:dim(train500)[1], 1)
testSeq=train500[tindex,]
predict(modelforSVM, testSeq);
It works great.
BUT if I do:
my.file=file("chr4_seqs.fasta", open="r")
chr4Seq = scan(my.file,list("",""),nlines=2) # read the data from a fasta
file using scan()
seqId = chr4Seq[[1]];
testSeq = as.data.frame(t(s2c(toupper(chr4Seq[[2]]))))
# the s2c function just converts the "STRING" to char vector "S" "T" "R"
"I" "N" "G"
predict(modelforSVM, testSeq);
Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") : contrasts can
be applied only to factors with 2 or more levels
-------------------------
If I apply factor() to testSeq, it still doesn't work : eg:
testSeq=data.frame(lapply(testSeq,factor))
I still get Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :
contrasts can be applied only to factors with 2 or more levels
Another thing I tried was reading the fasta file using the readFasta()
function and taking a sample input from the training set itself:
data500_1_fasta = read.fasta("toClassify500.fasta") # read a fasta file via
the seqinr package
data500_1_seq = t(getSequence(data500_1_fasta)) # get the sequences from it,
256 sequences, first 128 are +, next 128 are -
data500_1_df = as.data.frame(data500_1_seq) #make a data frame from it
class = append(rep("+",times=128),rep("-",times=128)) # add the class column
to it
data500_1_df = cbind(Class=class,data500_1_df)
data500_1_df = data.frame(lapply(data500_1_df,factor)) #finally apply the
factor() on the data frame
#Now train and get the model
modelforSVM <- ksvm(Class ~ ., data = data500_1_df, kernel = "rbfdot", kpar
= "automatic", C = 60, cross = 3, prob.model = TRUE)
and finally:
tindex =sample(1:dim(data500_1_df)[1], 1)
testSeq=data500_1_df[tindex,]
predict(modelforSVM, testSeq);
Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") : contrasts can
be applied only to factors with 2 or more levels
I am very confused at this point. What am I doing wrong? How do I use the
factor() function properly so that I don't get this error? Am I in the right
direction at all?
Thanks in anticipation of your help.
-vishal
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.