Re: [R] detecting noise in data?

David Stevens Tue, 24 Jan 2012 15:49:35 -0800

Michael

One way to get an idea about 'groupiness' is a method described by Box 
and Ramirez (1992) "Cumulative Score Charts", Qual. Rel. Eng. Int., 8, 
17-27 and in a book by Box and Luceno (1997) Statistical Control by 
Monitoring and Feedback Adjustment, Wiley Interscience, New York. This 
type of discovery can feed into the prespecifying of groups and cluster 
analysis as Bert suggested. It might be an independent way. I've done 
similar analysis using R but I don't know of any packages.  It isn't 
difficult to code.


David Stevens

On 1/24/2012 4:12 PM, Bert Gunter wrote:
> Statistical inference for group differences on groups determined from the 
> data yields incorrect results. Groups must be prespecified.
>
> Bert
>
> On Jan 24, 2012, at 2:55 PM, "HARROLD, Tim"<th...@doh.health.nsw.gov.au>  
> wrote:
>
>> You might want to provide an example? It's a pretty vague problem at the 
>> moment.
>>
>> If the data can be easily picked out by human eyes, you might want to think 
>> about your criteria you're using to pick out a contaminated result. If you 
>> can express it in such a way that you don't need to scan each observation 
>> (e.g. if a snapper weighs>= 300000kg then somebody entered that data 
>> incorrectly) then you can create an indicator variable and continue with 
>> your analysis.
>>
>> Other than that - some sort of cluster analysis might be able to pick up on 
>> 2 distinct groups provided within each group there's a reasonable level of 
>> homogeneity. Then from there, you can do a basic inference test for group 
>> means to detect whether there are significant differences detected between 
>> groups.
>>
>> Cheers,
>> Tim
>>
>>
>>
>> -----Original Message-----
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
>> Behalf Of Michael
>> Sent: Wednesday, 25 January 2012 9:31 AM
>> To: r-help
>> Subject: Re: [R] detecting noise in data?
>>
>> Hi all,
>>
>> I just wanted to add that I am looking for a solution that's in R ... to
>> handle this...
>>
>> And also, in a given sample, the correct data are of the majority and the
>> noise are of the minority.
>>
>> Thank you!
>>
>> On Tue, Jan 24, 2012 at 4:09 PM, Michael<comtech....@gmail.com>  wrote:
>>
>>> Hi all,
>>>
>>> I have data which are unfortuantely comtaminated by noise.
>>>
>>> We knew that the noise is at different level than the correct data, i.e.
>>> the noise data can be easily picked out by human eyes.
>>>
>>> It looks as if there are two people that generated the two very different
>>> data with different mean levels, and they got mixed together.
>>>
>>> i.e. assming the two data are following unknown distribution DF,
>>>
>>> and the two mean levels are u1 and u2... (unknown)
>>>
>>> Then the correct data are generated by DF(u1)
>>>
>>> and the noise are generated by DF(u2),
>>>
>>> and they got mixed...
>>>
>>> Now, how do I flag those suspicious data? At least is there a way I could
>>> answer the question:
>>>
>>> Given a sample of mixed data - are these data generated from the
>>> above-mentioned two sources, or the data are indeed generated from one
>>> source only.
>>>
>>> i.e. are there two substantially distinct species in the given data?
>>>
>>> Thanks a lot!
>>>
>>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________________________________________________________________________________
>> This email has been scanned for the NSW Ministry of Health by the Websense 
>> Hosted Email Security System.
>> Emails and attachments are monitored to ensure compliance with the NSW 
>> Ministry of Health's Electronic Messaging Policy.
>> ______________________________________________________________________________________________________________________
>>
>>
>> ______________________________________________________________________________________________________________________
>> Disclaimer: This message is intended for the addressee named and may contain 
>> confidential information.
>> If you are not the intended recipient, please delete it and notify the 
>> sender.
>> Views expressed in this message are those of the individual sender, and are 
>> not necessarily the views of the NSW Ministry of Health.
>> ______________________________________________________________________________________________________________________
>> This email has been scanned for the NSW Ministry of Health by the Websense 
>> Hosted Email Security System.
>> Emails and attachments are monitored to ensure compliance with the NSW 
>> Ministry of Health's Electronic Messaging Policy.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
David K Stevens, P.E., Ph.D., Professor
Civil and Environmental Engineering
Utah Water Research Laboratory
8200 Old Main Hill
Logan, UT  84322-8200
435 797 3229 - voice
435 797 1363 - fax
david.stev...@usu.edu




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] detecting noise in data?

Reply via email to