Hi,
See comments in line:
On Aug 11, 2009, at 2:45 PM, Jim Bouldin wrote:
No problem John, thanks for your help, and also thanks to Dan and
Patrick.
Wasn't able to read or try anybody's suggestions yesterday. Here's
what
I've discovered in the meantime:
What I did not include yesterday is that my original data frame,
called
"data", was this:
X Y V3
1 1 1 0.000000
2 2 1 8.062258
3 3 1 2.236068
4 4 1 6.324555
5 5 1 5.000000
6 1 2 8.062258
7 2 2 0.000000
8 3 2 9.486833
9 4 2 2.236068
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
13 3 3 0.000000
14 4 3 8.062258
15 5 3 5.099020
16 1 4 6.324555
17 2 4 2.236068
18 3 4 8.062258
19 4 4 0.000000
20 5 4 5.385165
21 1 5 5.000000
22 2 5 5.656854
23 3 5 5.099020
24 4 5 5.385165
25 5 5 0.000000
To this data frame I applied the following command:
data <- data[data$V3 >0,];data #to remove all rows where V3 = 0
giving me this (the point from which I started yesterday):
X Y V3
2 2 1 8.062258
3 3 1 2.236068
4 4 1 6.324555
5 5 1 5.000000
6 1 2 8.062258
8 3 2 9.486833
9 4 2 2.236068
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
14 4 3 8.062258
15 5 3 5.099020
16 1 4 6.324555
17 2 4 2.236068
18 3 4 8.062258
20 5 4 5.385165
21 1 5 5.000000
22 2 5 5.656854
23 3 5 5.099020
24 4 5 5.385165
So far so good. But when I then submit the command
data = data[X>Y,] #to select all rows where X > Y
This won't work in general, and is probably only working in this
particular case because you already have defined somewhere in your
workspace vars named X and Y.
What you wrote above isn't taking the values X,Y from data$X and data
$Y, respectively, but rather from var X and Y defined elsewhere.
Instead of doing data[X > Y], do:
data[data$X > data$Y,]
This should get you what you're expecting.
I get the problem result already mentioned, namely:
X Y V3
3 3 1 2.236068
4 4 1 6.324555
5 5 1 5.000000
6 1 2 8.062258
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
17 2 4 2.236068
18 3 4 8.062258
24 4 5 5.385165
which is clearly wrong! It doesn't matter if I give a new name to
the data
frame at each step or not, or whether I use the name "data" or not.
It
always gives the same wrong answer.
However, if I instead use the command:
subset(data, X>Y), I get the right answer, namely:
X Y V3
2 2 1 8.062258
3 3 1 2.236068
4 4 1 6.324555
5 5 1 5.000000
8 3 2 9.486833
9 4 2 2.236068
10 5 2 5.656854
14 4 3 8.062258
15 5 3 5.099020
20 5 4 5.385165
That's because when you are using X, and Y in your subset(...) call,
THIS takes X and Y to mean data$X and data$Y.
OK so the lesson so far is "use the subset function".
Hopefully you're learning a slightly different lesson now :-)
Does that clear things up at all?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.