Re: [R] Help with SVM package Kernlab

Vishal Thapar Thu, 24 Dec 2009 21:54:12 -0800

Hi Steve,

Thank you so much for the reply. The response to your queries are:
What do these commands return over your data?


1. is(train500)
-->"data.frame" "list"       "oldClass"   "mpinput"    "vector"
2. is(train500$class)
--> "NULL"             "OptionalFunction" "output"
3. is(train500[1,5])
-->  "factor"   "integer"  "oldClass" "output"   "numeric"  "vector"
4. is(testSeq)
--> "data.frame" "list"       "oldClass"   "mpinput"    "vector"
5. is(testSeq[1,5])
-->"factor"   "integer"  "oldClass" "output"   "numeric"  "vector"
6. is(testSeq$class)
-->  "NULL"             "OptionalFunction" "output"



How similar are we talking -- something is (obviously) off because
> using the promotergene dataset is quite straightforward:
>
> library(kernlab)
> data(promotergene)
> tr <- promotergene[1:90,]
> ts <- promotergene[91:106,]
> m <- ksvm(Class~., data=promotergene, kernel="rbfdot", kpar =
> "automatic", C = 60, cross = 3, prob.model = TRUE)
> p <- predict(m, ts)
>
> Right. here is the first line from my training set:
  Class V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19
V20 V21 V22 V23 V24 V25 V26 V27 V28
1     +  T  A  A  A  C  T  T  A  T   A   A   A   T   A   T   A   A   A   A
C   T   T   T   T   T   A   A   T
    V487 V488 V489 V490 V491 V492 V493 V494 V495 V496 V497 V498 V499 V500
1    G    A    T    T    T    C    A    T    T    T    T    G    T    T

Here is the first record for the promoter gene set:

  Class V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38
1     +  g  c  c  t  t  c  t  c   c   a   a   a   a   c   g   t   g   t
t   t   t   t   t   g   t   t   g   t   t   a   a   t   t   c   g   g   t
  V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51 V52 V53 V54 V55 V56
V57 V58
1   g   t   a   g   a   c   t   t   g   t   a   a   a   c   c   t   a   a
a   t


>  > The testSeq is a vector of 500 characters casted as a data.frame.
>
> What does that mean, exactly? How did you do that?
>
> The training set and the test set for me are different. I am training my
set on a particular experimental data that was generated and then I am using
this model to predict in the Arabidopsis genome. So I get fasta files of 500
base pair sequences that I read in via scan() as follows:

chr4Seq = scan(my.file,list("",""),nlines=2)

while(length(chr4Seq[[1]])>0)
{
    seqId = chr4Seq[[1]];
    testSeq = as.data.frame(t(s2c(chr4Seq[[2]])));
    testSeq=cbind(Class="-",testSeq); # this is optional, I added this later
to see if having the Class in the record removes the error.
    predictSvm1 <- (predict(modelforSVM, testSeq));
    print(predictSvm1);
    chr4Seq = scan(my.file,list("",""),nlines=2);
}


Can't you just start with all of your data in a data.frame and "cut out" the
> training and testing data.frames like I did above with the
> promotorgene data (see the tr and ts vars)
>

I cant cut the data from the training set since I want to test this model
over the entire chromosome.

Thanks again for any suggestions that you can give.

Sincerely,

Vishal
-- 
Vishal Thapar, Ph.D.
Post Doctoral Researcher
Cold Spring Harbor Lab
Williams Bldg

1 Bungtown Road
Cold Spring Harbor, NY - 11724

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with SVM package Kernlab

Reply via email to