Re: [R] Problem with rowMeans()

Erik Iverson Thu, 12 Jun 2008 17:18:01 -0700


ss wrote:

Thanks, Erik. I will try your code soon.

I did this first:
> data <-read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',row.names = NULL ,header=TRUE, fill=TRUE)
 > class(data[[3]])
[1] "factor"
 > is.numeric(data[[3]])
[1] FALSE
 >

So it is not numeric but 'factor' instead.
Can I convert this column to numeric?

That depends. My first question if I were you would be 'Why doesread.table assign the class factor to this column.'


Then read ?factor, paying particular attention to,

  In particular,
     'as.numeric' applied to a factor is meaningless, and may happen by
     implicit coercion.  To transform a factor 'f' to its original
     numeric values, 'as.numeric(levels(f))[f]' is recommended and
     slightly more efficient than 'as.numeric(as.character(f))'.

You might also try levels(data[[3]]), but the list will be long. Thegoal is to find the value(s) that are causing read.table to assign theclass 'factor' to this column. You have lots of values though, so Imight try something like the following:

setdiff(levels(data[[3]]),as.character(as.numeric(levels(data[[3]])[data[[3]]])))

and look at what that returns (you'll get a warning). Hopefully thattells you what is missing.


I see your new email, so that's that!

Good luck,
Erik


Allen

On Thu, Jun 12, 2008 at 7:48 PM, Erik Iverson <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:




    ss wrote:

        It is:

         > data <-
        read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
        row.names = NULL ,header=TRUE, fill=TRUE)
         > class(data[3])
        [1] "data.frame"
         >


    Oops, should have said  class(data[[3]]) and
    is.numeric(data[[3]])

    See ?Extract



        And if I try to use as.matrix(read.table()), I got:

         >data
        
<-as.matrix(read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
        + row.names = NULL ,header=TRUE, fill=TRUE))
         > data[1:4,1:4]
            Probe_ID       Gene_Symbol M16012391010920 M16012391010525

[1,] "A_23_P105862" "13CDNA73" "-1.6" " 0.16"[2,] "A_23_P76435" "15E1.2" "0.18" " 0.59"[3,] "A_24_P402115" "15E1.2" "1.63" "-0.62"[4,] "A_32_P227764" "15E1.2" "-0.76" "-0.42"

        You see they are surrounded by "".

        I don't see such if I just use >read.table


    That is because matrices (objects of class 'matrix') are of
    homogeneous type.  It changes everything to a character (including
    the numbers), which you certainly do NOT want.

    You want a data.frame, I will provide an example of what I think you
    are after.

    Try the following commands and see how they compare to your
    situation: these work for me.

    test <- data.frame(x = factor(rep(c("A", "B"), each = 13)), y =
    rnorm(26), z = rnorm(26))

    test

    class(test)

    is.numeric(test[[2]])

    is.numeric(test[[3]])

    rowMeans(test)

    rowMeans(test[2:3])

         > data <-
        read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
        row.names = NULL ,header=TRUE, fill=TRUE)
         > data[1:4,1:4]
             Probe_ID Gene_Symbol M16012391010920 M16012391010525
        1 A_23_P105862    13CDNA73            -1.6            0.16
        2  A_23_P76435      15E1.2            0.18            0.59
        3 A_24_P402115      15E1.2            1.63           -0.62
        4 A_32_P227764      15E1.2           -0.76           -0.42


        Thanks,
             Allen



        On Thu, Jun 12, 2008 at 7:34 PM, Erik Iverson
        <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
        <mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>>> wrote:



           ss wrote:

               Hi Wacek,

               Yes, data is data frame not a matrix.

                   is.numeric(data[3])

               [1] FALSE


           what is class(data[3])


               But I looked at the column 3 and it looks okay though.
        There are
               few NAs and
               I did find
               anything strange.

               Any suggestions?

               Thanks,
                    Allen



               On Thu, Jun 12, 2008 at 7:01 PM, Wacek Kusnierczyk <
               [EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>
               <mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>>> wrote:

                   ss wrote:

                       Thank you very much, Wacek! It works very well.
                       But there is a minor problem. I did the following:

                           data <-

read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',

                       +row.names = NULL ,header=TRUE, fill=TRUE)

                   looks like you have a data frame, not a matrix


                           dim(data)

                       [1] 23963    85

                           data[1:4,1:4]

                            Probe_ID Gene_Symbol M16012391010920
        M16012391010525

1 A_23_P105862 13CDNA73 -1.60.162 A_23_P76435 15E1.2 0.180.593 A_24_P402115 15E1.2 1.63-0.624 A_32_P227764 15E1.2 -0.76-0.42


                           data1<-data[sapply(data, is.numeric)]
                           dim(data1)

                       [1] 23963    82

                           data1[1:4,1:4]

                        M16012391010525 M16012391010843 M16012391010531
                       M16012391010921
                       1            0.16           -0.23           -1.40
                                  0.90
                       2            0.59            0.28           -0.30
                                  0.08
                       3           -0.62           -0.62           -0.22
                                 -0.18
                       4           -0.42            0.01            0.28
                                 -0.79

                       You will notice that, after using 'data[sapply(data,
                       is.numeric)]' and
                       getting
                       data1, the first sample in data, called
                       'M16012391010920', was missed
                       in data1.

                       Any further suggestions?

                   surely there must be an entry in column 3 that makes it
                   non-numeric.
                   what does is.numeric(data[3]) say?  (NAs should not
        make a
                   column
                   non-numeric, unless there are only NAs there, which
        is not
                   the case
                   here.)  check your data for non-numeric entries in
        column 3,
                   there can
                   be a typo.

                   vQ


                      [[alternative HTML version deleted]]

               ______________________________________________
               R-help@r-project.org <mailto:R-help@r-project.org>
        <mailto:R-help@r-project.org <mailto:R-help@r-project.org>>
        mailing list


               https://stat.ethz.ch/mailman/listinfo/r-help
               PLEASE do read the posting guide
               http://www.R-project.org/posting-guide.html
               and provide commented, minimal, self-contained,
        reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with rowMeans()

Reply via email to