Hi,

I seem to have made some headway on this problem but its still not solved.
It seems like this is a "factor" issue. When I read my training set, I read
it with read.csv() which converts each of the columns as "factors". From
this if I take a single row as my testSeq, it works great. On the other
hand, when I read in my test sequence from a Fasta file, I am using the
"seqinr" package's function "readFasta()" or if read a sequence directly
from a file I am using "scan()": eg:

train500 = read.csv("toClassify500_1.csv",header=TRUE) # reading the
training set
modelforSVM <- ksvm(Class ~ ., data = train500, kernel = "rbfdot", kpar =
"automatic", C = 60, cross = 3, prob.model = TRUE)
Now if I do:
tindex =sample(1:dim(train500)[1], 1)
testSeq=train500[tindex,]
predict(modelforSVM, testSeq);
It works great.

BUT if I do:

my.file=file("chr4_seqs.fasta", open="r")
chr4Seq = scan(my.file,list("",""),nlines=2) # read the data from a fasta
file using scan()

    seqId = chr4Seq[[1]];
    testSeq = as.data.frame(t(s2c(toupper(chr4Seq[[2]]))))
 # the s2c function just converts the "STRING" to char vector "S" "T" "R"
"I" "N" "G"

predict(modelforSVM, testSeq);
Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :   contrasts can
be applied only to factors with 2 or more levels
-------------------------
If I apply factor() to testSeq, it still doesn't work : eg:

testSeq=data.frame(lapply(testSeq,factor))
I still get Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :
  contrasts can be applied only to factors with 2 or more levels

Another thing I tried was reading the fasta file using the readFasta()
function and taking a sample input from the training set itself:

data500_1_fasta = read.fasta("toClassify500.fasta") # read a fasta file via
the seqinr package
data500_1_seq = t(getSequence(data500_1_fasta)) # get the sequences from it,
256 sequences, first 128 are +, next 128 are -
data500_1_df = as.data.frame(data500_1_seq) #make a data frame from it
class = append(rep("+",times=128),rep("-",times=128)) # add the class column
to it
data500_1_df = cbind(Class=class,data500_1_df)
data500_1_df = data.frame(lapply(data500_1_df,factor)) #finally apply the
factor() on the data frame

#Now train and get the model

modelforSVM <- ksvm(Class ~ ., data = data500_1_df, kernel = "rbfdot", kpar
= "automatic", C = 60, cross = 3, prob.model = TRUE)

and finally:
tindex =sample(1:dim(data500_1_df)[1], 1)
testSeq=data500_1_df[tindex,]
predict(modelforSVM, testSeq);

Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :   contrasts can
be applied only to factors with 2 or more levels

I am very confused at this point. What am I doing wrong? How do I use the
factor() function properly so that I don't get this error? Am I in the right
direction at all?

Thanks in anticipation of your help.

-vishal

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to