Dear Mr. Christos Hatzis, thank you so much for your answer which is in my eyes just brilliant! I followed it step by step (great and detailed explanation) and nearly everything is fine. - Except a problem in the very end, I haven't found a solution for until now. (Despite playing arround quite a lot...) Please let me explain:
> election.2005 <- c(16194,13136,3494,3838,4648,4118) #cut of last 3 digits, cause my laptop can't handle millions of rows... > attr(election.2005, "class") <- "table" > attr(election.2005, "dim") <- c(1,6) > attr(election.2005, "dimnames") <- list(c("votes"), c("spd", "cdu", "csu", "gruene", "fdp", "pds")) > head(election.2005) spd cdu csu gruene fdp pds votes 16194 13136 3494 3838 4648 4118 > el.dt <- as.data.frame(election.2005) > el.dt.exp <- el.dt[rep(1:nrow(el.dt), el.dt$Freq), -ncol(el.dt)] > dim(el.dt.exp) [1] 45428 2 > head(el.dt.exp) Var1 Var2 1 votes spd 1.1 votes spd 1.2 votes spd 1.3 votes spd 1.4 votes spd 1.5 votes spd My problem now is, that I would need either an autoincrementing identifier instead of "votes" in Var1 or the possibility to access the numbering by a column name (i.e. Var0). In addition I need a 3rd Variable for the year oft the election (2005, which is the same for all, but needed later on). So this is what it should look like: voter.id party election.year 1 1 spd 2005 1.1 2 spd 2005 1.2 3 spd 2005 1.3 4 spd 2005 1.4 5 spd 2005 1.5 6 spd 2005 The reason for that is the input format of the kappam.fleiss function of the irr package I use for calculation. It accepts a data.frame with the categories as rows (here we would have only one catgory: the year of the election) and the raters (here the voters) as columns. In the data.frame there will be the chosen party for each combination of electionyear and voter. This format can be easily achieved using the reshape package. Assuming voter.id would be an autoincrementing identifier, the command should be: >library(reshape) >el.dt.exp.molten<-melt(el.dt.exp, id=c("voter.id")) #which would propably change not really anything in this case, because the data is already in a "molten" form >kappa.frame<-cast(el.dt.exp.molten, election.year ~ voter.id, subset=variable=="party") I'd be extremely happy in case you might help me out again! Have a nice weekend and many thanks so far! Greetings from Munich, Felix Mueller-Sarnowski Christos Hatzis wrote: > On the general question on how to create a dataset that matches the > frequencies in a table, function as.data.frame can be useful. It takes as > argument an object of a class 'table' and returns a data frame of > frequencies. > > Consider for example table 6.1 of Fleiss et al (3rd Ed): > > >> birth.weight <- c(10,15,40,135) >> attr(birth.weight, "class") <- "table" >> attr(birth.weight, "dim") <- c(2,2) >> attr(birth.weight, "dimnames") <- list(c("A", "Ab"), c("B", "Bb")) >> birth.weight >> > B Bb > A 10 40 > Ab 15 135 > >> summary(birth.weight) >> > Number of cases in table: 200 > Number of factors: 2 > Test for independence of all factors: > Chisq = 3.429, df = 1, p-value = 0.06408 > >> bw.dt <- as.data.frame(birth.weight) >> > > Observations (rows) in this table can then be replicated according to their > corresponding frequencies to yield the expanded dataset that conforms with > the original table. > > >> bw.dt.exp <- bw.dt[rep(1:nrow(bw.dt), bw.dt$Freq), -ncol(bw.dt)] >> dim(bw.dt.exp) >> > [1] 200 2 > >> table(bw.dt.exp) >> > Var2 > Var1 B Bb > A 10 40 > Ab 15 135 > > The above approach is not restricted to 2x2 tables, and should be > straightforward generate datasets that conform to arbitrary nxm frequency > tables. > > -Christos Hatzis > > > >> -----Original Message----- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of Greg Snow >> Sent: Friday, August 22, 2008 12:41 PM >> To: drflxms; r-help@r-project.org >> Subject: Re: [R] simple generation of artificial data with >> defined features >> >> I don't think that the election data is the right data to >> demonstrate Kappa, you need subjects that are classified by 2 >> or more different raters/methods. The election data could be >> considered classifying the voters into which party they voted >> for, but you only have 1 rater. Maybe if you had some survey >> data that showed which party each voter voted for in 2 or >> more elections, then that may be a good example dataset. >> Otherwise you may want to stick with the sample datasets. >> >> There are other packages that compute Kappa values as well (I >> don't know if others calculate this particular version), but >> some of those take the summary data as input rather than the >> raw data, which may be easier if you just have the summary tables. >> >> >> -- >> Gregory (Greg) L. Snow Ph.D. >> Statistical Data Center >> Intermountain Healthcare >> [EMAIL PROTECTED] >> (801) 408-8111 >> >> >> >> >>> -----Original Message----- >>> From: [EMAIL PROTECTED] >>> [mailto:[EMAIL PROTECTED] On Behalf Of drflxms >>> Sent: Friday, August 22, 2008 6:12 AM >>> To: r-help@r-project.org >>> Subject: [R] simple generation of artificial data with defined >>> features >>> >>> Dear R-colleagues, >>> >>> I am quite a newbie to R fighting my stupidity to solve a probably >>> quite simple problem of generating artificial data with defined >>> features. >>> >>> I am conducting a study of inter-observer-agreement in >>> child-bronchoscopy. One of the most important measures is Kappa >>> according to Fleiss, which is very comfortable available in >>> >> R through >> >>> the irr-package. >>> Unfortunately medical doctors like me don't really >>> >> understand much of >> >>> statistics. Therefore I'd like to give the reader an easy >>> understandable example of Fleiss-Kappa in the Methods part. >>> >> To achieve >> >>> this, I obtained a table with the results of the German >>> >> election from >> >>> 2005: >>> >>> party number of votes percent >>> >>> SPD 16194665 34,2 >>> CDU 13136740 27,8 >>> CSU 3494309 7,4 >>> Gruene 3838326 8,1 >>> FDP 4648144 9,8 >>> PDS 4118194 8,7 >>> >>> I want to show the agreement of voters measured by Fleiss-Kappa. To >>> calculate this with the kappam.fleiss-function of irr, I need a >>> data.frame like this: >>> >>> (id of 1st voter) (id of 2nd voter) >>> >>> party spd cdu >>> >>> Of course I don't plan to calculate this with the million of cases >>> mentioned in the table above (I am working on a small laptop). A >>> division by 1000 would be more than perfect for this example. The >>> exact format of the table is generally not so important, as I could >>> reshape nearly every format with the help of the reshape-package. >>> >>> Unfortunately I could not figure out how to create such a >>> fictive/artificial dataset as described above. Any >>> >> data.frame would be >> >>> nice, that keeps at least the percentage. String-IDs of >>> >> parties could >> >>> be substituted by numbers of course (would be even better >>> >> for function >> >>> kappam.fleiss in irr!). >>> >>> I would appreciate any kind of help very much indeed. >>> Greetings from Munich, >>> >>> Felix Mueller-Sarnowski >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.