On 07/31/2013 10:03 PM, Mª Teresa Martinez Soriano wrote:
Hi
First of all, thanks for this service, it is being very useful for me. I am new
in R so I have a lot of doubts.
I have to do imputation in a data set, this is a sample of my data set which
looks like:
NUMERO Data1 Data2 IE.2003 IE.2004 IE.2005 IE.2006 IE.2007 IE.2008
IE.2009 IE.2010
20 133 30/09/2002 18/06/2013 153 279 289 370 412 262
115 75
21 138 11/07/2002 13/05/2009 5460 7863 8365 12009 16763 NA
NA NA
22 146 16/10/2009 18/06/2013 NA NA NA NA NA NA
NA 35
23 152 27/05/1999 18/06/2013 NA 80 77 60 89 137
144 146
24 154 21/12/2004 18/06/2013 NA NA 148 186 302 233
194 204
25 166 8/02/2008 18/06/2013 NA NA NA NA NA NA
98 160
26 177 20/02/1996 18/06/2013 16 4 NA 3 3 NA
5 5
The problem is that I have cells which have to be empty, this depends on Data1
and Data2
For instance in the third row, you can see that Data1 is equal to 16/10/2009,
so I don't have to
have any information until year 2009, therefore
IE.2003,IE.2004,IE.2005,IE.2006, IE.2007, IE.2008
have
to be totally empty, but this doesn't mean that they are missing values, in
fact they are not. I
don't want to get any imputation in this cells.
Ie.2009 and IE.2010 have to be full and they are not, so this cells are
missing values and I want to get imputed values for them. (I would delete this
row, because it is impossible to get any imformation about it, but it is ok for
this example)
On the other hand, in the last row NA is a real missing value.
How can I specify that this cells are empty and don't get this imputed values??
I have tried to put NaN but I have problems in some functions that I need to do
it before the
imputation.
Hi Teresa,
I didn't see an answer to this, so I'll offer a couple of suggestions.
First, NA is probably the best thing to have in your "empty" cells. If
you change the NA cells to "", the columns will become factors, and if
you then change the values back to numeric, the blanks will become NAs
again.
I would get a set of vectors of logical values that indicated which
cells you _don't_ want to impute (say your data frame is tmsdf):
dontimpute2003<-which(
as.numeric(unlist(sapply(strsplit(tmsdf$Data1,"/"),"[",3))) < 2003 &
is.na(tmsdf$IE.2003))
dontimpute2004<-which(
as.numeric(unlist(sapply(strsplit(tmsdf$Data1,"/"),"[",3))) < 2004 &
is.na(tmsdf$IE.2004))
...
then do your imputation on the entire data frame and reset the ones you
don't want imputed to NA:
tmsdf$2003[dontimpute2003]<-NA
...
Jim
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.