To R-help users: I want to use ggplot two plot summary statistics on the frequency of letters from a page of text. My data frame has four columns:
(1) The line number [1 to 30] (2) The letter [a to z] (3) The frequency of the letter [assuming there is 80 letters per line] (4) The factor 'type': bad or good (purely artificial factor) I want to achieve the following plot: (a) Bar plot with an x-axis to be the letters and the y-axis the sum of 30 letter frequencies from each line of each letter. (b) Split each bar (for a letter) into two bars for 'good' and 'bad' types. (c) Display the union of the top 8 most frequency used letters for both types 'good' and 'bad'. By point (c) I mean: if a,e,f,h,i,t,s,r are the most frequent letter of type 'good' and a,e,f,h,i,m,l,p are the most frequent letter of type 'bad'. Then I would like my plot to feature the letters a,e,f,h,i,t,s,r,m,l,p. Here is my code: # There will be 30 lines and we want to record the frequency of each letter # on each line. lines <- c(rep(1:30, each=26)) letter <- c(rep(letters, times=30)) # We have taken the letter frequencies from # http://www.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html freq <- c(8.12, 1.49, 2.71, 4.32, 12.02, 2.30, 2.03, 5.92, 7.31, 0.10, 0.69, 3.98, 2.61, 6.95, 7.68, 1.82, 0.11, 6.02, 6.28, 9.10, 2.88, 1.11, 2.09, 0.17, 2.11, 0.07) freq <- freq/100 # We assume each line contains 80 letters and change the seed for each line # for variability. letterfreq <- integer() for (i in 1:30) { set.seed(i) s<-data.frame(sample(letters, size = 80, replace = TRUE, prob = freq)) names(s) <- "ltr" s$ltr <- factor(s$ltr, levels = letters) frq<-as.data.frame(table(s)) letterfreq <- append(letterfreq, frq$Freq) } ltrfreq <- data.frame(lines, letter, letterfreq) # Add an artificial factor column _type_: good/bad. So each pair # (week, letter) has type 'good' or 'bad' with equal probability. # Set the seed for reproducibility. set.seed(999) ltrfreq$type <- factor(sample(c("good","bad"), size = 780, replace = TRUE, prob = c(0.5,0.5))) # Here is the plot I want but this includes all 26 letters. ggplot(ltrfreq,aes(x=factor(letter),y=letterfreq, fill=type), color=type) + stat_summary(fun.y=sum,position=position_dodge(),geom="bar") Best regards, Kieran. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.