Hi,

See comments in line:

On Aug 11, 2009, at 2:45 PM, Jim Bouldin wrote:


No problem John, thanks for your help, and also thanks to Dan and Patrick. Wasn't able to read or try anybody's suggestions yesterday. Here's what
I've discovered in the meantime:

What I did not include yesterday is that my original data frame, called
"data", was this:

  X Y       V3
1  1 1 0.000000
2  2 1 8.062258
3  3 1 2.236068
4  4 1 6.324555
5  5 1 5.000000
6  1 2 8.062258
7  2 2 0.000000
8  3 2 9.486833
9  4 2 2.236068
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
13 3 3 0.000000
14 4 3 8.062258
15 5 3 5.099020
16 1 4 6.324555
17 2 4 2.236068
18 3 4 8.062258
19 4 4 0.000000
20 5 4 5.385165
21 1 5 5.000000
22 2 5 5.656854
23 3 5 5.099020
24 4 5 5.385165
25 5 5 0.000000

To this data frame I applied the following command:

data <- data[data$V3 >0,];data #to remove all rows where V3 = 0

giving me this (the point from which I started yesterday):

  X Y       V3
2  2 1 8.062258
3  3 1 2.236068
4  4 1 6.324555
5  5 1 5.000000
6  1 2 8.062258
8  3 2 9.486833
9  4 2 2.236068
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
14 4 3 8.062258
15 5 3 5.099020
16 1 4 6.324555
17 2 4 2.236068
18 3 4 8.062258
20 5 4 5.385165
21 1 5 5.000000
22 2 5 5.656854
23 3 5 5.099020
24 4 5 5.385165

So far so good.  But when I then submit the command
data = data[X>Y,] #to select all rows where X > Y

This won't work in general, and is probably only working in this particular case because you already have defined somewhere in your workspace vars named X and Y.

What you wrote above isn't taking the values X,Y from data$X and data $Y, respectively, but rather from var X and Y defined elsewhere.

Instead of doing data[X > Y], do:

data[data$X > data$Y,]

This should get you what you're expecting.

I get the problem result already mentioned, namely:

  X Y       V3
3  3 1 2.236068
4  4 1 6.324555
5  5 1 5.000000
6  1 2 8.062258
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
17 2 4 2.236068
18 3 4 8.062258
24 4 5 5.385165

which is clearly wrong! It doesn't matter if I give a new name to the data frame at each step or not, or whether I use the name "data" or not. It
always gives the same wrong answer.

However, if I instead use the command:
subset(data, X>Y), I get the right answer, namely:

  X Y       V3
2  2 1 8.062258
3  3 1 2.236068
4  4 1 6.324555
5  5 1 5.000000
8  3 2 9.486833
9  4 2 2.236068
10 5 2 5.656854
14 4 3 8.062258
15 5 3 5.099020
20 5 4 5.385165

That's because when you are using X, and Y in your subset(...) call, THIS takes X and Y to mean data$X and data$Y.

OK so the lesson so far is "use the subset function".

Hopefully you're learning a slightly different lesson now :-)

Does that clear things up at all?

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to