On Sep 1, 2010, at 10:19 AM, David Winsemius wrote:
On Sep 1, 2010, at 9:55 AM, David Winsemius wrote:
On Sep 1, 2010, at 9:20 AM, Chris Howden wrote:
Hi everyone,
I’m looking for a clever bit of code to replace NA’s with a
specific score
depending on an indicator variable.
I can see how to do it using lots of if statements but I’m sure
there most
be a neater, better way of doing it.
Any ideas at all will be much appreciated, I’m dreading coding up
all those
if statements!!!!!
My problem is as follows:
I have a data set with lots of missing data:
EG Raw Data Set
Category variable1 variable2
variable3
1 5 NA
NA
1 NA
3 4
2 NA
7 NA
This does not do its work by category (since I got tired of fixing
mangled htmlized datasets) but it seems to me that a tapply "wrap"
could do either of these operations within categories:
Why not try out Hadley's plyr package?
require(plyr)
ddply(egraw2, .(category), .fun=function(df) {
sapply(df[-1],
#Take out the [-1]
function(x) {mnx <- mean(x, na.rm=TRUE);
sapply(x, function(z) if (is.na(z))
{mnx}else{z})
}
) } )
Tested on
egraw2 <- data.frame(category=rep(1:4, 4),
var1=sample(c(1:3, NA,NA), 16, replace =TRUE),
var2=sample(c(5:10, NA,NA), 16, replace =TRUE),
var3=sample(c(15:20, NA,NA), 16, replace =TRUE) )
It did not create an error and only after I sorted that dataframe and
the first ddply result did I see that some sort of misregistration had
occurred; Better with:
res <-ddply(egraw2, .(category), .fun=function(df) {
sapply(df,
function(x) {mnx <- mean(x, na.rm=TRUE);
sapply(x, function(z) if (is.na(z))
{mnx}else{z})
}
) } )
--
David.
> egraw
Category variable1 variable2 variable3
1 1 5 NA NA
2 1 NA 3 4
3 2 NA 7 NA
> lapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
sapply(x, function(z) if (is.na(z))
{mnx}else{z})
}
)
$Category
[1] 1 1 2
$variable1
[1] 5 5 5
$variable2
[1] 5 3 7
$variable3
[1] 4 4 4
> sapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
sapply(x, function(z) if (is.na(z))
{mnx}else{z})
}
)
Category variable1 variable2 variable3
[1,] 1 5 5 4
[2,] 1 5 3 4
[3,] 2 5 7 4
etc
Now I want to replace the NA’s with the average for each category,
so if
these averages were:
EG Averages
Category variable1 variable2
variable3
1 4.5
3.2 2.5
2 3.5
7.4 5.9
So I’d like my data set to look like the following once I’ve
replaced the
NA’s with the appropriate category average:
EG Imputed Data Set
Category variable1 variable2
variable3
1 5 3.2
2.5
1 4.5
3 4
2 3.5
7 5.9
etc
Any ideas would be very much appreciated!!!!!
You might add reading the Posing Guide and setting up your reader
to post in plain text to your TODO list.
thankyou
Chris Howden
.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.