Re: [R] Outliers Help

arun Fri, 30 Aug 2013 05:47:14 -0700

HI,

Also,


dd1<-matrix(cbind(D[,1],(D[-c(1:2)]/D[,2]>4)*1),dimnames=NULL,ncol=7)
identical(dd,dd1)
#[1] TRUE
A.K.






----- Original Message -----
From: Jose Iparraguirre <jose.iparragui...@ageuk.org.uk>
To: Mª Teresa Martinez Soriano <teresama...@hotmail.com>; 
"r-help@r-project.org" <r-help@r-project.org>
Cc: 
Sent: Friday, August 30, 2013 5:39 AM
Subject: Re: [R] Outliers Help

Hi Ma Teresa,

Sorry, but I can't understand what you're trying to achieve.
On a statistical note, I'd tend to think more in terms of medians and would 
think hard before replacing any outliers, but that's another matter.

Here I created the dataframe dd with the means column of D in its first column, 
and then populated with a 1 whenever the value of D for that cell was greater 
than 4 times the mean for that row -your definition of 'outlier'. 
> dd <- rep(0,15*7)
> dim(dd) <- c(15,7)
> dd[,1]<- D[,1]
> for (i in 1:15){
+ for (j in 2:7){
+ dd[i,j] <- D[i,(j+1)]/D[i,2]>4
+ }
+ }
> dd
       [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]  1108    0    0    0    0    0    0
[2,]  1479   NA   NA    0    1    0    0
[3,]  1591    0    0    0    0    0    0
[4,]  3408    0    0    0    0    0    0
[5,]  3423   NA   NA   NA    1    0    0
[6,]  3872    0    0    0    0    0    1
[7,]  5823    0    0    0    0    0    0
[8,]  6051   NA   NA   NA   NA    0    0
[9,]  8099    0    0    0    0    0    0
[10,]  8100   NA   NA   NA   NA    0    0
[11,] 10640    1    1    1    0    0    0
[12,] 12600    0    0    0    0    0    0
[13,] 14680    0    0    1    0    0    1
[14,] 14698    0    0    0    0    0    0
[15,] 17143    0    0    0    0    0    0

So, you encounter four situations:

a) as in row 2, you have an outlier preceded and followed by values
b) as in row 5, you have an outlier preceded by an NA
c) as in row 6, there is an outlier in the last column
d) as in row 11, there are two or more consecutive outliers


The replacement rule you described would only apply to situations a) (ie 
replacing the outlier by the mean of the preceding and subsequent values), and 
b) (replacing it by the mean for the row).
But what of situations c) and d)?

And, because this is just a chunk of a bigger dataset, you can also get an 
outlier in the first column, followed by a number. Again, your rule has not 
accounted for this situation either.

Hope this helps,

José

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mª Teresa Martinez Soriano
Sent: 30 August 2013 09:13
To: r-help@r-project.org
Subject: [R] Outliers Help

This is my a part of my data set 

> D[1:15,c(1,5:10)]

       X.      media IE.2005 IE.2006 IE.2007 IE.2008 IE.2009 IE.2010
1   1108   22.00000    60.0      39     4.0     8.0    16.0     5.0
2   1479  110.00000      NA      NA    53.0  1166.0   344.8   110.0
3   1591   86.60000   247.0      87    95.0    94.0    81.0    76.0
4   3408  807.00000   302.0     322   621.0  1071.0  1301.0  1225.0
5   3423    9.00000      NA      NA      NA   410.8     7.0    11.0
6   3872  103.25000   288.6     113   116.0    90.0    94.0 12036.6
7   5823   73.00000   117.0      70    80.0    74.0    69.0    72.0
8   6051   73.00000      NA      NA      NA      NA    60.0    86.0
9   8099  125.16667   196.0     161   150.0    94.0    72.0    78.0
10  8100   70.00000      NA      NA      NA      NA    48.0    92.0
11 10640   67.33333  1256.6    1152   664.2    74.0    77.0    51.0
12 12600 2417.00000  1960.0    2383  2453.0  2506.0  2758.0  2442.0
13 14680   38.00000    30.0      61   373.6    42.0    19.0   220.8
14 14698  698.16667   553.0     664   847.0   800.0   679.0   646.0
15 17143  392.16667   323.0     322   434.0   383.0   459.0   432.0





I have done multiple imputation and now I have some outliers which I would like 
to replace with the mean of this row or if it is possible with the mean of the 
previos and the next value of this row, I mean for instance:

value 1 - Outlier- Value 2

I would like to replace the outlier with the mean of value 1 and value2, the 
problem is that this values could be NA ( NA after the imputation because they 
don't exist), in this case I would like to replace outlier with the mean of the 
row.


An other problem I have is to detect correctly outlier values, for instance in 
this example of data set for X=3872 and IE.2010, we can see an outlier, I have 
thought to compare the values with the mean ( column media) 

I have tried to do this code


D<-datos[, c(1,16:24)]
m<-as.matrix(D) 
for( i in 1: nrow(D))

{
  
   for( j in 5:(ncol(D)-1)) # I would change this in the new data set, because 
I will have more years than 2010
    {  
   if(!is.na(m[i,j])&& !is.na 
(m[i,j+1])&&!is.na(m[i,j-1])&&!is.na(m[i,2])&&((m[i,j]/m[i,2])>4)){m[m[i,j]]<- 
(m[i,j-1]+m[i,j+1])/2 # Here I would like to find the values that are much more 
bigger than the mean of this row, 
    #if( !is.na(m[i,j])
    # and replace them by the mean of the previous and the next values of the 
same row.
  
  } 
  }
}
D<-as.data.frame(m)

But I get a data.frame that I had previously, it changes nothing




I accept any idea.
Thanks a lot, Teresa                           
    [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

The Wireless from Age UK | Radio for grown-ups.

www.ageuk.org.uk/thewireless


If you’re looking for a radio station that offers real variety, tune in to The 
Wireless from Age UK. 
Whether you choose to listen through the website at 
www.ageuk.org.uk/thewireless, on digital radio (currently available in London 
and Yorkshire) or through our TuneIn Radio app, you can look forward to an 
inspiring mix of music, conversation and useful information 24 hours a day.




-------------------------------
Age UK is a registered charity and company limited by guarantee, (registered 
charity number 1128267, registered company number 6825798). 
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

For the purposes of promoting Age UK Insurance, Age UK is an Appointed 
Representative of Age UK Enterprises Limited, Age UK is an Introducer 
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth 
Access for the purposes of introducing potential annuity and health 
cash plans customers respectively.  Age UK Enterprises Limited, JLT Benefit 
Solutions Limited and Simplyhealth Access are all authorised and 
regulated by the Financial Services Authority. 
------------------------------

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are 
addressed. If you receive a message in error, please advise the sender and 
delete immediately.

Except where this email is sent in the usual course of our business, any 
opinions expressed in this email are those of the author and do not 
necessarily reflect the opinions of Age UK or its subsidiaries and associated 
companies. Age UK monitors all e-mail transmissions passing 
through its network and may block or modify mails which are deemed to be 
unsuitable.

Age Concern England (charity number 261794) and Help the Aged (charity number 
272786) and their trading and other associated companies merged 
on 1st April 2009.  Together they have formed the Age UK Group, dedicated to 
improving the lives of people in later life.  The three national 
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help 
the Aged in these nations to form three registered charities: 
Age Scotland, Age NI, Age Cymru.




______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Outliers Help

Reply via email to