Re: [R] na.omit - Is it working properly?

peter dalgaard Tue, 03 May 2011 23:03:49 -0700

On May 3, 2011, at 21:18 , Kalicin, Sarah wrote:

> 
> I have a work around for this, but can someone explain why the first example 
> does not work properly? I believed it worked in the previous version of R, by 
> selecting just the rows=200525 and omitting the na's. I just upgraded to 
> 2.13. I am also concern with the row numbers being different in the 
> selections, should I be worried? FYI, I just selected the first few rows for 
> demonstration, please do not worry that the number of rows shown are not 
> equal. - Sarah
> 
> With na.omit around the column, but it is showing other values in the F.WW 
> column other than 200525, along with NA.  I was hoping that this would omit 
> all the NA's, and show all the rows that P$F.WW=200525. I believe it did with 
> the previous version of R.


That's highly unlikely. na.omit(P$WW) has fewer elements than there are rows in 
P so you get vector recycling in the style of 

> thuesen[c(F,F,F,F,T),]
   blood.glucose short.velocity
5            7.2           1.27
10          12.2           1.22
15           6.7           1.52
20          16.1           1.05

(now why don't we get the usual warning about "not a multiple of" in this case?)

Worse, if you omit observations prior to comparison, the result won't line up. 
E.g. in the thuesen data, obs.

> thuesen[na.omit(thuesen$short.velocity)==1.12,]
   blood.glucose short.velocity
16           8.6             NA
22           4.9           1.03

whereas in fact 

> subset(thuesen, short.velocity==1.12)
   blood.glucose short.velocity
17           4.2           1.12
23           8.8           1.12



> P[na.omit(P$F.WW)==200525, c(51, 52)]
>          F.WW        R.WW
> 45      200525          NA
> 53          NA          NA
> 61      200534      200534
> 63      200608      200608
> 66      200522      200541
> 80          NA          NA
> 150     200521      200516
> 231     200530      200530
> 
> No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is 
> what is expected!! The row numbers are not the same as the above example, 
> except the first row.
>> P[P$F.WW==200525, c(51, 52)]
>            F.WW     R.WW
> 45        200525          NA
> NA            NA          NA
> NA.1          NA          NA
> NA.2          NA          NA
> NA.3          NA          NA
> 57        200525      200526
> 65        200525          NA
> 67        200525          NA
> 70        200525      200525
> NA.4          NA          NA
> NA.5          NA          NA
> 86        200525          NA

Presumably, a number of rows got omitted here? The NA's are a bit of a pain, 
but that's the way things work: If there is an observation that you don't know 
whether to include, you get an NA filled row.

> thuesen[thuesen$short.velocity==1.12,]
   blood.glucose short.velocity
NA            NA             NA
17           4.2           1.12
23           8.8           1.12

To avoid this, you explicitly test for NA using is.na() or use subset() which 
does it internally. 

> 
> Na.omit excludes the na's. This is what I want. The concern I have is why the 
> row numbers do not match any of those shown in the examples above.
>> na.omit(P[P$F.WW==200525, c(51, 52)])
>        F.WW        R.WW
> 57    200525      200526
> 70    200525      200525
> 161   200525      200525
> 245   200525      200525
> 246   200525      200525
> 247   200525      200526
> 256   200525      200525
> 266   200525      200525
> 269   200525      200525
> 271   200525      200526
> 276   200525      200526
> 278   200525      200526
> 

Well, now you remove rows with NA _anywhere_, so e.g. row #65 is out because 
R.WW is missing. I expect #161 and higher was just chopped from the earlier 
list. 

In short, nothing out of the ordinary seems to be going on here.


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] na.omit - Is it working properly?

Reply via email to