Jim Lemon wrote on 09/20/2011 04:15:46 AM: > > On 09/19/2011 04:46 PM, Henri-Paul Indiogine wrote: > > Greetings! > > > > I am using the R library RQDA to assign certain codes to paragraphs of > > documents in a collection. Several paragraphs are assigned more than > > 1 code. E.g. often the codes "poverty" and "education" will be > > assigned to the same paragraph. Often also "math" and "career" will > > be given to the same paragraphs. Other codes are never given to the > > same paragraphs. > > > > I would like to calculate the relationship or "closeness" of certain > > codes. RQDA will generate a cross-codes table. It has the form of an > > upper triangular matrix where the upper triangle has the number of > > cross occurrences of 2 codes at their intersection. The lower > > triangle is filled with NA. The diagonal simply has the number of > > occurrences of the codes by themselves. > > > > The row names are the names of the codes and the column names are the > > IDs of the codes. E.g. > > > > 1 2 3 4 > > code1 3 0 2 1 > > code2 NA 4 1 0 > > code3 NA NA 2 0 > > code4 NA NA NA 3 > > > > We can see that code1 is associated 2 out of 3 times with code3. > > Code2 is present 1 out of 4 times with code3. Code2 is never assigned > > to the same paragraph as Code1 and Code4 are, and so on. > > > > I am trying to understand how to create some sort of graph or diagram > > to represent this. Should I use a cluster diagram or a network graph? > > Also, what sort of R code could I use? > > Hi Henri, > The intersectDiagram function in the plotrix package displays the > intersections of sets as rectangles with widths (and areas) proportional
> to the number of members of each set intersection. This may be a way for > you to represent your codes. For your example, you could proceed like > this. Create a file ("hp.csv")containing the following: > > paragraph,attribute > p1,code1 > p1,code3 > p2,code1 > p2,code3 > p3,code1 > p3,code4 > p4,code2 > p5,code2 > p6,code2 > p7,code2 > p7,code3 > p8,code3 > p9,code3 > p10,code4 > p11,code4 > p12,code4 > > then: > > library(plotrix) > hp<-read.csv("hp.csv") > intersectDiagram(hp,main="Combinations of codes") > > There are other ways to represent your original data that > intersectDiagram will read in that you might like to try. > > Jim Another approach would be to redefine the cross-codes table as distances. For example, if the cross-codes table is a matrix called m ... # convert to "distances" d <- 1 - m/diag(m) # fill in the complete matrix d[lower.tri(d)] <- d[upper.tri(d)] # use multidimensional scaling to represent the distances in two dimensions twodim <- cmdscale(d) plot(twodim, type="n") text(twodim, rownames(twodim)) Jean [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.