Dear Erik and Wacek, I would request that you stop working on my problem. I had the second column deleted and the problem is gone. I don't know why but apparently the second column somehow interfered with the third column such that the third column is regarded as 'factor' not 'numeric'.
I can recover the 2nd column, which is gene symbol later so I cannot worry about it for now. I just don't want you to invest your precious time on this. Thanks much, Allen On Thu, Jun 12, 2008 at 8:01 PM, ss <[EMAIL PROTECTED]> wrote: > Thanks, Erik. I will try your code soon. > > I did this first: > > > data <- > read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt', > row.names = NULL ,header=TRUE, fill=TRUE) > > class(data[[3]]) > [1] "factor" > > is.numeric(data[[3]]) > [1] FALSE > > > > So it is not numeric but 'factor' instead. > Can I convert this column to numeric? > > Allen > > > On Thu, Jun 12, 2008 at 7:48 PM, Erik Iverson <[EMAIL PROTECTED]> > wrote: > >> >> >> ss wrote: >> >>> It is: >>> >>> > data <- >>> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt', >>> row.names = NULL ,header=TRUE, fill=TRUE) >>> > class(data[3]) >>> [1] "data.frame" >>> > >>> >>> >> Oops, should have said class(data[[3]]) and >> is.numeric(data[[3]]) >> >> See ?Extract >> >> >>> And if I try to use as.matrix(read.table()), I got: >>> >>> >data >>> <-as.matrix(read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt', >>> + row.names = NULL ,header=TRUE, fill=TRUE)) >>> > data[1:4,1:4] >>> Probe_ID Gene_Symbol M16012391010920 M16012391010525 >>> [1,] "A_23_P105862" "13CDNA73" "-1.6" " 0.16" [2,] >>> "A_23_P76435" "15E1.2" "0.18" " 0.59" [3,] "A_24_P402115" >>> "15E1.2" "1.63" "-0.62" [4,] "A_32_P227764" "15E1.2" >>> "-0.76" "-0.42" >>> You see they are surrounded by "". >>> >>> I don't see such if I just use >read.table >>> >>> >> That is because matrices (objects of class 'matrix') are of homogeneous >> type. It changes everything to a character (including the numbers), which >> you certainly do NOT want. >> >> You want a data.frame, I will provide an example of what I think you are >> after. >> >> Try the following commands and see how they compare to your situation: >> these work for me. >> >> test <- data.frame(x = factor(rep(c("A", "B"), each = 13)), y = rnorm(26), >> z = rnorm(26)) >> >> test >> >> class(test) >> >> is.numeric(test[[2]]) >> >> is.numeric(test[[3]]) >> >> rowMeans(test) >> >> rowMeans(test[2:3]) >> >> > data <- >>> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt', >>> row.names = NULL ,header=TRUE, fill=TRUE) >>> > data[1:4,1:4] >>> Probe_ID Gene_Symbol M16012391010920 M16012391010525 >>> 1 A_23_P105862 13CDNA73 -1.6 0.16 >>> 2 A_23_P76435 15E1.2 0.18 0.59 >>> 3 A_24_P402115 15E1.2 1.63 -0.62 >>> 4 A_32_P227764 15E1.2 -0.76 -0.42 >>> >>> >>> Thanks, >>> Allen >>> >>> >>> >>> On Thu, Jun 12, 2008 at 7:34 PM, Erik Iverson <[EMAIL PROTECTED]<mailto: >>> [EMAIL PROTECTED]>> wrote: >>> >>> >>> >>> ss wrote: >>> >>> Hi Wacek, >>> >>> Yes, data is data frame not a matrix. >>> >>> is.numeric(data[3]) >>> >>> [1] FALSE >>> >>> >>> what is class(data[3]) >>> >>> >>> But I looked at the column 3 and it looks okay though. There are >>> few NAs and >>> I did find >>> anything strange. >>> >>> Any suggestions? >>> >>> Thanks, >>> Allen >>> >>> >>> >>> On Thu, Jun 12, 2008 at 7:01 PM, Wacek Kusnierczyk < >>> [EMAIL PROTECTED] >>> <mailto:[EMAIL PROTECTED]>> wrote: >>> >>> ss wrote: >>> >>> Thank you very much, Wacek! It works very well. >>> But there is a minor problem. I did the following: >>> >>> data <- >>> >>> >>> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt', >>> +row.names = NULL ,header=TRUE, fill=TRUE) >>> >>> looks like you have a data frame, not a matrix >>> >>> >>> dim(data) >>> >>> [1] 23963 85 >>> >>> data[1:4,1:4] >>> >>> Probe_ID Gene_Symbol M16012391010920 M16012391010525 >>> 1 A_23_P105862 13CDNA73 -1.6 0.16 >>> 2 A_23_P76435 15E1.2 0.18 0.59 >>> 3 A_24_P402115 15E1.2 1.63 -0.62 >>> 4 A_32_P227764 15E1.2 -0.76 -0.42 >>> >>> data1<-data[sapply(data, is.numeric)] >>> dim(data1) >>> >>> [1] 23963 82 >>> >>> data1[1:4,1:4] >>> >>> M16012391010525 M16012391010843 M16012391010531 >>> M16012391010921 >>> 1 0.16 -0.23 -1.40 >>> 0.90 >>> 2 0.59 0.28 -0.30 >>> 0.08 >>> 3 -0.62 -0.62 -0.22 >>> -0.18 >>> 4 -0.42 0.01 0.28 >>> -0.79 >>> >>> You will notice that, after using 'data[sapply(data, >>> is.numeric)]' and >>> getting >>> data1, the first sample in data, called >>> 'M16012391010920', was missed >>> in data1. >>> >>> Any further suggestions? >>> >>> surely there must be an entry in column 3 that makes it >>> non-numeric. >>> what does is.numeric(data[3]) say? (NAs should not make a >>> column >>> non-numeric, unless there are only NAs there, which is not >>> the case >>> here.) check your data for non-numeric entries in column 3, >>> there can >>> be a typo. >>> >>> vQ >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org <mailto:R-help@r-project.org> mailing list >>> >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.