Re: [R] simple generation of artificial data with defined features

Greg Snow Mon, 25 Aug 2008 08:29:02 -0700

> -----Original Message-----
> From: drflxms [mailto:[EMAIL PROTECTED]
> Sent: Saturday, August 23, 2008 6:47 AM
> To: Greg Snow
> Cc: r-help@r-project.org
> Subject: Re: Re: [R] simple generation of artificial data
> with defined features
>
> Hello Mr. Greg Snow!
>
> Thank you very much for your prompt answer.
> > I don't think that the election data is the right data to
> demonstrate Kappa, you need subjects that are classified by 2
> or more different raters/methods.  The election data could be
> considered classifying the voters into which party they voted
> for, but you only have 1 rater.
> I think, It should be possible to calculate kappa in case one
> has a little different point of view from the one you
> described above: Take the voters as raters who "judge" the
> category "election" with one party out of the six mentioned
> in my previous e-mail (which are simply the top six).
> This makes sense to me, because an election is somehow
> nothing else but a survey with the question "who should lead
> our country" - given six options in this example. As kappa is
> a measure of agreement, it should be able to illustrate the
> agreement of the voters answers to this question.
> For me this is - in priciple - no different from asking
> "Where is the stenosis in the video of this endoscopy"
> offering six options representing anatomic locations each.


Ok, rethinking it in these terms is fine (just a transpose of mine), but you 
still have the same problem with only having 1 election.  Generally analyzing 
data with only one datapoint (generally 0 degrees of freedom) does not give you 
much, if any, information.  Let's look at your doctors finding the stenosis and 
start with the simpler case of just 2 doctors.  If you only show them 1 video 
and ask the question once, then the 2 doctors will agree either 100% of the 
time or 0% of the time.  Is either of those numbers meaningful?  If we add more 
doctors, then we still will have either 100% agreement or 0% agreement with 
only 1 observation.  With 1 election, what can you say about the agreement?  If 
you have info on multiple elections (maybe other candidates within the same 
election), then you can measure the agreement using kappa style scores, but I 
don't think that any version of kappa is designed to work for 1 observation.  
Hence my suggestion of looking for different data to!
  help understand the function.


> > Otherwise you may want to stick with the sample datasets.
> >
> The example data sets are of excellent quality and very
> interesting. I am sure there would be brilliant examples
> among them. But I have to admit that,t a I have no t a good
> overview of the available datasets at the moment (as a
> newbie).  I just wanted to give an example out of every days
> life, everybody is familiar with. An election is something
> which came to my mind spontaneously.

Well the help file for the function you are using shows one sample data set, 
you can also look in the references cited in that same help page, those could 
lead you to other understandable datasets.

I find that when I am trying to understand something, simulated datasets help 
me, that way I know the "truth" and can see how the statistic changes for 
different "truths".  You can keep the story in terms of elections to keep it 
understandable to the audience, but then simulate data representing multiple 
elections/offices/etc. looking at different degrees of relationship.  I would 
start with pure randomness/independence (easy to simulate, any agreement is due 
to chance), then go to pure dependence (if they voted one way for the 1st 
election/candidate, the always voted the same for the rest), then look at 
different levels in between (generate 1st vote randomly, but 2nd vote has 90% 
probability of being the same, 10% of being ranomly from the remaining) and do 
this for different levels of dependence.  This should help with your 
understanding of how the kappa value represents the agreement.

> > There are other packages that compute Kappa values as well
> (I don't know if others calculate this particular version),
> but some of those take the summary data as input rather than
> the raw data, which may be easier if you just have the summary tables.
> >
> >
> I chose Fleiss Kappa, because it is a more general form of
> Cohen's Kappa allowing m raters and n categories (instead of
> only two raters and to categories when using Cohen's kappa).
> Looking for another package calculating it from summary
> tables might be the simplest solution to my problem. Thank
> you very much for this hint!
> On the other hand it would be nice to use the very same
> method for the example as for the "real" data. The example
> will be part of the "methods" section.
>
> Thank you again very much for your tips and the quick reply.
> Have a nice weekend!
> Greetings from Munich,
>


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple generation of artificial data with defined features

Reply via email to