On Sun, 19 Jan 2014 11:39:43 -0800 (PST)
kingsly <[email protected]> wrote:

> Dear R community
>  
> I have a large data set contain some empty cells. Because of that,
> may be I am wrong, <NA> values are produced. Now I want replace both
> empty and <NA> values with zero. 
> Elder1 <- data.frame(
>   ID=c("ID1","ID2","ID3","ID6","ID8"),
>   age=c(38,35,"",NA,NA))
> Output I am expecting
>  
> ID   age
> ID1  38
> ID2  35
> ID3  0
> ID6  0
> ID8  0
>  
> In advance I thank your help.
> 
The age variable is being read in as a factor because of the 
"".  If you were to replace it with NA, the type becomes numerical:

Before replacement:

str(Elder1)
'data.frame':   5 obs. of  2 variables:
 $ ID : Factor w/ 5 levels "ID1","ID2","ID3",..: 1 2 3 4 5
 $ age: Factor w/ 3 levels "","35","38": 3 2 1 NA NA

Notice that the "" is treated as a factor level.

After:

str(Elder1)
'data.frame':   5 obs. of  2 variables:
 $ ID : Factor w/ 5 levels "ID1","ID2","ID3",..: 1 2 3 4 5
 $ age: num  38 35 NA NA NA

SO, the question, is what do you want to do with that column?  An "NA"
value tells you honestly that the information is missing.  Replacing it
with a zero can be misleading and can bias some basic parameter
estimates.

After you know how you want to treat the data in that field, you may
have a better idea of how to handle the missing data.

JWD

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to