On May 3, 2011, at 21:18 , Kalicin, Sarah wrote: > > I have a work around for this, but can someone explain why the first example > does not work properly? I believed it worked in the previous version of R, by > selecting just the rows=200525 and omitting the na's. I just upgraded to > 2.13. I am also concern with the row numbers being different in the > selections, should I be worried? FYI, I just selected the first few rows for > demonstration, please do not worry that the number of rows shown are not > equal. - Sarah > > With na.omit around the column, but it is showing other values in the F.WW > column other than 200525, along with NA. I was hoping that this would omit > all the NA's, and show all the rows that P$F.WW=200525. I believe it did with > the previous version of R.
That's highly unlikely. na.omit(P$WW) has fewer elements than there are rows in P so you get vector recycling in the style of > thuesen[c(F,F,F,F,T),] blood.glucose short.velocity 5 7.2 1.27 10 12.2 1.22 15 6.7 1.52 20 16.1 1.05 (now why don't we get the usual warning about "not a multiple of" in this case?) Worse, if you omit observations prior to comparison, the result won't line up. E.g. in the thuesen data, obs. > thuesen[na.omit(thuesen$short.velocity)==1.12,] blood.glucose short.velocity 16 8.6 NA 22 4.9 1.03 whereas in fact > subset(thuesen, short.velocity==1.12) blood.glucose short.velocity 17 4.2 1.12 23 8.8 1.12 > P[na.omit(P$F.WW)==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > 53 NA NA > 61 200534 200534 > 63 200608 200608 > 66 200522 200541 > 80 NA NA > 150 200521 200516 > 231 200530 200530 > > No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is > what is expected!! The row numbers are not the same as the above example, > except the first row. >> P[P$F.WW==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > NA NA NA > NA.1 NA NA > NA.2 NA NA > NA.3 NA NA > 57 200525 200526 > 65 200525 NA > 67 200525 NA > 70 200525 200525 > NA.4 NA NA > NA.5 NA NA > 86 200525 NA Presumably, a number of rows got omitted here? The NA's are a bit of a pain, but that's the way things work: If there is an observation that you don't know whether to include, you get an NA filled row. > thuesen[thuesen$short.velocity==1.12,] blood.glucose short.velocity NA NA NA 17 4.2 1.12 23 8.8 1.12 To avoid this, you explicitly test for NA using is.na() or use subset() which does it internally. > > Na.omit excludes the na's. This is what I want. The concern I have is why the > row numbers do not match any of those shown in the examples above. >> na.omit(P[P$F.WW==200525, c(51, 52)]) > F.WW R.WW > 57 200525 200526 > 70 200525 200525 > 161 200525 200525 > 245 200525 200525 > 246 200525 200525 > 247 200525 200526 > 256 200525 200525 > 266 200525 200525 > 269 200525 200525 > 271 200525 200526 > 276 200525 200526 > 278 200525 200526 > Well, now you remove rows with NA _anywhere_, so e.g. row #65 is out because R.WW is missing. I expect #161 and higher was just chopped from the earlier list. In short, nothing out of the ordinary seems to be going on here. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.