ss wrote:
Thanks, Erik. I will try your code soon.
I did this first:
> data <-
read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
row.names = NULL ,header=TRUE, fill=TRUE)
> class(data[[3]])
[1] "factor"
> is.numeric(data[[3]])
[1] FALSE
>
So it is not numeric but 'factor' instead.
Can I convert this column to numeric?
That depends. My first question if I were you would be 'Why does
read.table assign the class factor to this column.'
Then read ?factor, paying particular attention to,
In particular,
'as.numeric' applied to a factor is meaningless, and may happen by
implicit coercion. To transform a factor 'f' to its original
numeric values, 'as.numeric(levels(f))[f]' is recommended and
slightly more efficient than 'as.numeric(as.character(f))'.
You might also try levels(data[[3]]), but the list will be long. The
goal is to find the value(s) that are causing read.table to assign the
class 'factor' to this column. You have lots of values though, so I
might try something like the following:
setdiff(levels(data[[3]]),
as.character(as.numeric(levels(data[[3]])[data[[3]]])))
and look at what that returns (you'll get a warning). Hopefully that
tells you what is missing.
I see your new email, so that's that!
Good luck,
Erik
Allen
On Thu, Jun 12, 2008 at 7:48 PM, Erik Iverson <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>> wrote:
ss wrote:
It is:
> data <-
read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
row.names = NULL ,header=TRUE, fill=TRUE)
> class(data[3])
[1] "data.frame"
>
Oops, should have said class(data[[3]]) and
is.numeric(data[[3]])
See ?Extract
And if I try to use as.matrix(read.table()), I got:
>data
<-as.matrix(read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
+ row.names = NULL ,header=TRUE, fill=TRUE))
> data[1:4,1:4]
Probe_ID Gene_Symbol M16012391010920 M16012391010525
[1,] "A_23_P105862" "13CDNA73" "-1.6" " 0.16"
[2,] "A_23_P76435" "15E1.2" "0.18" " 0.59"
[3,] "A_24_P402115" "15E1.2" "1.63" "-0.62"
[4,] "A_32_P227764" "15E1.2" "-0.76" "-0.42"
You see they are surrounded by "".
I don't see such if I just use >read.table
That is because matrices (objects of class 'matrix') are of
homogeneous type. It changes everything to a character (including
the numbers), which you certainly do NOT want.
You want a data.frame, I will provide an example of what I think you
are after.
Try the following commands and see how they compare to your
situation: these work for me.
test <- data.frame(x = factor(rep(c("A", "B"), each = 13)), y =
rnorm(26), z = rnorm(26))
test
class(test)
is.numeric(test[[2]])
is.numeric(test[[3]])
rowMeans(test)
rowMeans(test[2:3])
> data <-
read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
row.names = NULL ,header=TRUE, fill=TRUE)
> data[1:4,1:4]
Probe_ID Gene_Symbol M16012391010920 M16012391010525
1 A_23_P105862 13CDNA73 -1.6 0.16
2 A_23_P76435 15E1.2 0.18 0.59
3 A_24_P402115 15E1.2 1.63 -0.62
4 A_32_P227764 15E1.2 -0.76 -0.42
Thanks,
Allen
On Thu, Jun 12, 2008 at 7:34 PM, Erik Iverson
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
<mailto:[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>>> wrote:
ss wrote:
Hi Wacek,
Yes, data is data frame not a matrix.
is.numeric(data[3])
[1] FALSE
what is class(data[3])
But I looked at the column 3 and it looks okay though.
There are
few NAs and
I did find
anything strange.
Any suggestions?
Thanks,
Allen
On Thu, Jun 12, 2008 at 7:01 PM, Wacek Kusnierczyk <
[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>
<mailto:[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>>> wrote:
ss wrote:
Thank you very much, Wacek! It works very well.
But there is a minor problem. I did the following:
data <-
read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
+row.names = NULL ,header=TRUE, fill=TRUE)
looks like you have a data frame, not a matrix
dim(data)
[1] 23963 85
data[1:4,1:4]
Probe_ID Gene_Symbol M16012391010920
M16012391010525
1 A_23_P105862 13CDNA73 -1.6
0.16
2 A_23_P76435 15E1.2 0.18
0.59
3 A_24_P402115 15E1.2 1.63
-0.62
4 A_32_P227764 15E1.2 -0.76
-0.42
data1<-data[sapply(data, is.numeric)]
dim(data1)
[1] 23963 82
data1[1:4,1:4]
M16012391010525 M16012391010843 M16012391010531
M16012391010921
1 0.16 -0.23 -1.40
0.90
2 0.59 0.28 -0.30
0.08
3 -0.62 -0.62 -0.22
-0.18
4 -0.42 0.01 0.28
-0.79
You will notice that, after using 'data[sapply(data,
is.numeric)]' and
getting
data1, the first sample in data, called
'M16012391010920', was missed
in data1.
Any further suggestions?
surely there must be an entry in column 3 that makes it
non-numeric.
what does is.numeric(data[3]) say? (NAs should not
make a
column
non-numeric, unless there are only NAs there, which
is not
the case
here.) check your data for non-numeric entries in
column 3,
there can
be a typo.
vQ
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org <mailto:R-help@r-project.org>
<mailto:R-help@r-project.org <mailto:R-help@r-project.org>>
mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.