On 07/31/2013 10:03 PM, Mª Teresa Martinez Soriano wrote:
Hi

First of all, thanks for this service, it is being very useful for me. I am new 
in R so I have a lot of doubts.

I have to do imputation in a data set, this is a sample of my data set which 
looks like:


  NUMERO      Data1      Data2 IE.2003 IE.2004 IE.2005 IE.2006 IE.2007 IE.2008 
IE.2009 IE.2010
20    133 30/09/2002 18/06/2013     153     279     289     370     412     262 
    115      75
21    138 11/07/2002 13/05/2009    5460    7863    8365   12009   16763      NA 
     NA      NA
22    146 16/10/2009 18/06/2013      NA      NA      NA      NA      NA      NA 
     NA      35
23    152 27/05/1999 18/06/2013      NA      80      77      60      89     137 
    144     146
24    154 21/12/2004 18/06/2013      NA      NA     148     186     302     233 
    194     204
25    166  8/02/2008 18/06/2013      NA      NA      NA      NA      NA      NA 
     98     160
26    177 20/02/1996 18/06/2013      16       4      NA       3       3      NA 
      5       5



The problem is that I have cells which have to be empty, this depends on Data1 
and Data2

For instance in the third row, you can see that Data1 is equal to 16/10/2009, 
so I don't have to

have any information until year 2009, therefore 
IE.2003,IE.2004,IE.2005,IE.2006, IE.2007, IE.2008

have


to be totally empty, but this doesn't mean that they are  missing values, in 
fact they are not. I

don't  want to get any imputation in this cells.

  Ie.2009 and IE.2010 have to be full and they are not, so this cells are 
missing values and I want to get imputed values for them. (I would delete this 
row, because it is impossible to get any imformation about it, but it is ok for 
this example)

On the other hand, in the last row NA is a real missing value.



How can I specify that this cells are empty and don't get this imputed values??

I have tried to put NaN but I have problems in some functions that I need to do 
it before the

imputation.

Hi Teresa,
I didn't see an answer to this, so I'll offer a couple of suggestions. First, NA is probably the best thing to have in your "empty" cells. If you change the NA cells to "", the columns will become factors, and if you then change the values back to numeric, the blanks will become NAs again.

I would get a set of vectors of logical values that indicated which cells you _don't_ want to impute (say your data frame is tmsdf):

dontimpute2003<-which(
 as.numeric(unlist(sapply(strsplit(tmsdf$Data1,"/"),"[",3))) < 2003 &
 is.na(tmsdf$IE.2003))
dontimpute2004<-which(
 as.numeric(unlist(sapply(strsplit(tmsdf$Data1,"/"),"[",3))) < 2004 &
 is.na(tmsdf$IE.2004))
...

then do your imputation on the entire data frame and reset the ones you don't want imputed to NA:

tmsdf$2003[dontimpute2003]<-NA
...

Jim

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to