[R] Outliers Help

Mª Teresa Martinez Soriano Fri, 30 Aug 2013 01:16:00 -0700

This is my a part of my data set 

> D[1:15,c(1,5:10)]


       X.      media IE.2005 IE.2006 IE.2007 IE.2008 IE.2009 IE.2010
1   1108   22.00000    60.0      39     4.0     8.0    16.0     5.0
2   1479  110.00000      NA      NA    53.0  1166.0   344.8   110.0
3   1591   86.60000   247.0      87    95.0    94.0    81.0    76.0
4   3408  807.00000   302.0     322   621.0  1071.0  1301.0  1225.0
5   3423    9.00000      NA      NA      NA   410.8     7.0    11.0
6   3872  103.25000   288.6     113   116.0    90.0    94.0 12036.6
7   5823   73.00000   117.0      70    80.0    74.0    69.0    72.0
8   6051   73.00000      NA      NA      NA      NA    60.0    86.0
9   8099  125.16667   196.0     161   150.0    94.0    72.0    78.0
10  8100   70.00000      NA      NA      NA      NA    48.0    92.0
11 10640   67.33333  1256.6    1152   664.2    74.0    77.0    51.0
12 12600 2417.00000  1960.0    2383  2453.0  2506.0  2758.0  2442.0
13 14680   38.00000    30.0      61   373.6    42.0    19.0   220.8
14 14698  698.16667   553.0     664   847.0   800.0   679.0   646.0
15 17143  392.16667   323.0     322   434.0   383.0   459.0   432.0

 



 I have done multiple imputation and now I have some outliers which I would 
like to replace with the mean of this row or if it is possible with the mean of 
the previos and the next value of this row, I mean for instance:

value 1 - Outlier- Value 2

I would like to replace the outlier with the mean of value 1 and value2, the 
problem is that this values could be NA ( NA after the imputation because they 
don't exist), in this case I would like to replace outlier with the mean of the 
row.


An other problem I have is to detect correctly outlier values, for instance in 
this example of data set for X=3872 and IE.2010, we can see an outlier, I have 
thought to compare the values with the mean ( column media) 

I have tried to do this code
 

 D<-datos[, c(1,16:24)]
 m<-as.matrix(D) 
 for( i in 1: nrow(D))

{
  
   for( j in 5:(ncol(D)-1)) # I would change this in the new data set, because 
I will have more years than 2010
    {   
   if(!is.na(m[i,j])&& !is.na 
(m[i,j+1])&&!is.na(m[i,j-1])&&!is.na(m[i,2])&&((m[i,j]/m[i,2])>4)){m[m[i,j]]<- 
(m[i,j-1]+m[i,j+1])/2 # Here I would like to find the values that are much more 
bigger than the mean of this row, 
    #if( !is.na(m[i,j])
    # and replace them by the mean of the previous and the next values of the 
same row.
   
  } 
  }
}
D<-as.data.frame(m)

But I get a data.frame that I had previously, it changes nothing




I accept any idea.
 Thanks a lot, Teresa                                     
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Outliers Help

Reply via email to