Thank you very much for your codes and your descriptions. I ran them with my data, and they worked well!
I am so happy to learn that your R codes and package could help me make these plots: I was baffled by this task and have been searching for solutions. I thought I was missing some arguments for ggplot, but couldn't figure out which ones. Then I almost went to python and html for solving it, although I believe that there must be a way that R can do it. It would be great if you could include it as a function for your plotrix package, as I have seen other people asking around for it, too. Best regards, Zhao 2013/11/2 Jim Lemon <j...@bitwrit.com.au> > On 11/02/2013 10:35 AM, Zhao Jin wrote: > >> Dear all, >> >> I am trying to make a series of waffle plot-like figures for my data to >> visualize the ratios of amino acid residues at each position. For each one >> of 37 positions, there may be one to four different amino acid residues. >> So >> the data consist of the positions, what residues are there, and the ratios >> of residues. The ratios of residues at a position add up to 100, or close >> to 100 (more on this soon)*. I am hoping to make a *square* waffle >> >> plot-like figure for each position, and fill the 10 X 10 grids with colors >> representing each amino acid residue and areas for grids of a certain >> color >> corresponding to the ratio of that residue. Then I could line up all the >> plots in one row from position 1 to position 37. >> *: if the sum of the ratios is less than 100 at a position, that's because >> of an unknown residue which I did not include in the table. >> >> I am attaching the dput output for my data here: >> structure(list(position = c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 7L, >> 8L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L, >> 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 22L, 23L, 24L, 25L, 26L, >> 26L, 27L, 28L, 29L, 29L, 30L, 31L, 32L, 33L, 34L, 34L, 35L, 35L, >> 36L, 36L, 36L, 37L, 37L), residue = structure(c(9L, 4L, 18L, >> 7L, 9L, 7L, 12L, 3L, 4L, 1L, 7L, 9L, 12L, 1L, 4L, 4L, 13L, 5L, >> 14L, 2L, 18L, 3L, 16L, 9L, 17L, 15L, 7L, 5L, 5L, 7L, 17L, 13L, >> 15L, 11L, 6L, 13L, 16L, 14L, 10L, 13L, 17L, 1L, 1L, 17L, 1L, >> 12L, 1L, 5L, 3L, 6L, 8L, 7L, 9L), .Label = c("A", "C", "D", "E", >> "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", >> "Y"), class = "factor"), ratio = c(99L, 100L, 100L, 1L, 99L, >> 100L, 100L, 1L, 98L, 100L, 10L, 87L, 3L, 79L, 9L, 12L, 84L, 99L, >> 1L, 83L, 13L, 100L, 100L, 100L, 100L, 99L, 100L, 100L, 100L, >> 98L, 2L, 100L, 100L, 100L, 2L, 98L, 100L, 100L, 1L, 99L, 100L, >> 100L, 98L, 100L, 95L, 5L, 98L, 2L, 3L, 95L, 1L, 1L, 98L)), .Names = >> c("position", >> "residue", "ratio"), class = "data.frame", row.names = c("1", >> "2", "3", "4", "5", "6", "10", "11", "12", "13", "14", "15", >> "17", "18", "19", "20", "23", "25", "27", "28", "29", "30", "31", >> "32", "33", "34", "36", "37", "38", "39", "40", "42", "43", "44", >> "45", "46", "47", "48", "50", "51", "52", "53", "54", "56", "57", >> "58", "59", "60", "61", "62", "63", "64", "65")) >> >> Inspired by a statexchange post, I am using these scripts to make the >> plots >> : >> library(ggplot2) >> col4=c('#E66101','#FDB863','#B2ABD2','#5E3C99') >> dflist=list() >> for (i in 1:37){ >> residue_num=length(which(df$position==i)) >> dflist[[i]]=df[df$position==i,2:3] >> waffle=expand.grid(y=1:residue_num,x=seq_len(ceiling( >> sum(dflist[[i]]$ratio)/residue_num))) >> residuevec=rep(dflist[[i]]$residue,dflist[[i]]$ratio) >> waffle$residue=c(as.vector(residuevec),rep(NA,nrow( >> waffle)-length(residuevec))) >> png(paste('plot',i,'.png',sep='')) >> print(ggplot(waffle, aes(x = x, y = y, fill = residue)) + geom_tile(color >> = >> "white") + scale_fill_manual("residue",values = col4) + coord_equal() + >> theme(panel.grid.minor=element_blank(),panel.grid.major=element_blank()) >> + theme(axis.ticks=element_blank()) + >> theme(axis.text.x=element_blank(),axis.text.y=element_blank()) + >> theme(axis.title.x=element_blank(),axis.title.y=element_blank()) >> ) >> dev.off()} >> >> With my scripts, I could make a waffle plot, but not a *square* 10 X 10 >> >> waffle plot. Also, the grid size differs for positions with different >> numbers of residues. I am suspecting that I didn't use coord_equal() >> correctly. >> >> So I wonder how I can make the plots like I described above in ggplot2 or >> with some other packages. Also, is there a way to assign a color to >> different residues, say, purple for alanine, blue for glycine, etc, and >> incorporate that information in the for loop? >> >> Hi Zhao, > By beginning with a 10x10 matrix of NA values and then replacing some of > them with a color, I think you can do what you want. First you need a > function to fill one corner of your matrix with values, leaving the rest > uncolored (i.e. NA): > > fill.corner<-function(x,nrow,ncol) { > xlen<-length(x) > if(nrow*ncol > xlen) { > newmat<-matrix(NA,nrow=nrow,ncol=ncol) > xside<-1 > while(xside*xside < xlen) xside<-xside+1 > row=1 > col=1 > for(xindex in 1:xlen) { > newmat[row,col]<-x[xindex] > if(row == xside) { > col<-col+1 > row<-1 > } > else row<-row+1 > } > return(newmat) > } > cat("Too many values in x for",xrow,"by",xcol,"\n") > } > > Then you have to massage your data frame into 37 smaller data frames, > create matrices with the values and colors to display on your 37 waffle > plots: > > library(plotrix) > # get an "alphabet" of colors > alphacol<-rainbow(18) > # the actual values in the plotted matrix don't matter > fakemat<-matrix(1:100,nrow=10) > # pick off the positions one by one > for(pos in 1:37) { > posdf<-zjdat[zjdat$position == pos,] > for(res in 1:dim(posdf)[1]) { > if(res == 1) > rescol<-rep(alphacol[as.numeric(posdf$residue[res])], > posdf$ratio[res]) > else > rescol<-c(rescol,rep(alphacol[as.numeric(posdf$residue[res])], > posdf$ratio[res])) > } > if(!is.null(resmat<-fill.corner(rescol,10,10))) > color2D.matplot(fakemat,border="lightgray",cellcolors=resmat, > yrev=FALSE,main=c(pos,length(resmat))) > } > > That might get you started. In fact, I might even write a waffle plot > function for plotrix. > > Jim > > > -- Zhao JIN Ph.D. Candidate Ruth Ley Lab 467 Biotech Field of Microbiology, Cornell University Lab: 607.255.4954 Cell: 412.889.3675 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.