Thank you very much for your codes and your descriptions. I ran them with
my data, and they worked well!

I am so happy to learn that your R codes and package could help me make
these plots: I was baffled by this task and have been searching for
solutions. I thought I was missing some arguments for ggplot, but couldn't
figure out which ones. Then I almost went to python and html for solving
it, although I believe that there must be a way that R can do it. It would
be great if you could include it as a function for your plotrix package, as
I have seen other people asking around for it, too.

Best regards,
Zhao


2013/11/2 Jim Lemon <j...@bitwrit.com.au>

> On 11/02/2013 10:35 AM, Zhao Jin wrote:
>
>> Dear all,
>>
>> I am trying to make a series of waffle plot-like figures for my data to
>> visualize the ratios of amino acid residues at each position. For each one
>> of 37 positions, there may be one to four different amino acid residues.
>> So
>> the data consist of the positions, what residues are there, and the ratios
>> of residues. The ratios of residues at a position add up to 100, or close
>> to 100 (more on this soon)*. I am hoping to make a *square* waffle
>>
>> plot-like figure for each position, and fill the 10 X 10 grids with colors
>> representing each amino acid residue and areas for grids of a certain
>> color
>> corresponding to the ratio of that residue. Then I could line up all the
>> plots in one row from position 1 to position 37.
>> *: if the sum of the ratios is less than 100 at a position, that's because
>> of an unknown residue which I did not include in the table.
>>
>> I am attaching the dput output for my data here:
>> structure(list(position = c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 7L,
>> 8L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L,
>> 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 22L, 23L, 24L, 25L, 26L,
>> 26L, 27L, 28L, 29L, 29L, 30L, 31L, 32L, 33L, 34L, 34L, 35L, 35L,
>> 36L, 36L, 36L, 37L, 37L), residue = structure(c(9L, 4L, 18L,
>> 7L, 9L, 7L, 12L, 3L, 4L, 1L, 7L, 9L, 12L, 1L, 4L, 4L, 13L, 5L,
>> 14L, 2L, 18L, 3L, 16L, 9L, 17L, 15L, 7L, 5L, 5L, 7L, 17L, 13L,
>> 15L, 11L, 6L, 13L, 16L, 14L, 10L, 13L, 17L, 1L, 1L, 17L, 1L,
>> 12L, 1L, 5L, 3L, 6L, 8L, 7L, 9L), .Label = c("A", "C", "D", "E",
>> "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V",
>> "Y"), class = "factor"), ratio = c(99L, 100L, 100L, 1L, 99L,
>> 100L, 100L, 1L, 98L, 100L, 10L, 87L, 3L, 79L, 9L, 12L, 84L, 99L,
>> 1L, 83L, 13L, 100L, 100L, 100L, 100L, 99L, 100L, 100L, 100L,
>> 98L, 2L, 100L, 100L, 100L, 2L, 98L, 100L, 100L, 1L, 99L, 100L,
>> 100L, 98L, 100L, 95L, 5L, 98L, 2L, 3L, 95L, 1L, 1L, 98L)), .Names =
>> c("position",
>> "residue", "ratio"), class = "data.frame", row.names = c("1",
>> "2", "3", "4", "5", "6", "10", "11", "12", "13", "14", "15",
>> "17", "18", "19", "20", "23", "25", "27", "28", "29", "30", "31",
>> "32", "33", "34", "36", "37", "38", "39", "40", "42", "43", "44",
>> "45", "46", "47", "48", "50", "51", "52", "53", "54", "56", "57",
>> "58", "59", "60", "61", "62", "63", "64", "65"))
>>
>> Inspired by a statexchange post, I am using these scripts to make the
>> plots
>> :
>> library(ggplot2)
>> col4=c('#E66101','#FDB863','#B2ABD2','#5E3C99')
>> dflist=list()
>> for (i in 1:37){
>> residue_num=length(which(df$position==i))
>> dflist[[i]]=df[df$position==i,2:3]
>> waffle=expand.grid(y=1:residue_num,x=seq_len(ceiling(
>> sum(dflist[[i]]$ratio)/residue_num)))
>> residuevec=rep(dflist[[i]]$residue,dflist[[i]]$ratio)
>> waffle$residue=c(as.vector(residuevec),rep(NA,nrow(
>> waffle)-length(residuevec)))
>> png(paste('plot',i,'.png',sep=''))
>> print(ggplot(waffle, aes(x = x, y = y, fill = residue)) + geom_tile(color
>> =
>> "white") + scale_fill_manual("residue",values = col4) + coord_equal() +
>> theme(panel.grid.minor=element_blank(),panel.grid.major=element_blank())
>> + theme(axis.ticks=element_blank()) +
>> theme(axis.text.x=element_blank(),axis.text.y=element_blank()) +
>> theme(axis.title.x=element_blank(),axis.title.y=element_blank())
>> )
>> dev.off()}
>>
>> With my scripts, I could make a waffle plot, but not a *square* 10 X 10
>>
>> waffle plot. Also, the grid size differs for positions with different
>> numbers of residues. I am suspecting that I didn't use coord_equal()
>> correctly.
>>
>> So I wonder how I can make the plots like I described above in ggplot2 or
>> with some other packages. Also, is there a way to assign a color to
>> different residues, say, purple for alanine, blue for glycine, etc, and
>> incorporate that information in the for loop?
>>
>>  Hi Zhao,
> By beginning with a 10x10 matrix of NA values and then replacing some of
> them with a color, I think you can do what you want. First you need a
> function to fill one corner of your matrix with values, leaving the rest
> uncolored (i.e. NA):
>
> fill.corner<-function(x,nrow,ncol) {
>  xlen<-length(x)
>  if(nrow*ncol > xlen) {
>   newmat<-matrix(NA,nrow=nrow,ncol=ncol)
>   xside<-1
>   while(xside*xside < xlen) xside<-xside+1
>   row=1
>   col=1
>   for(xindex in 1:xlen) {
>    newmat[row,col]<-x[xindex]
>    if(row == xside) {
>     col<-col+1
>     row<-1
>    }
>    else row<-row+1
>   }
>   return(newmat)
>  }
>  cat("Too many values in x for",xrow,"by",xcol,"\n")
> }
>
> Then you have to massage your data frame into 37 smaller data frames,
> create matrices with the values and colors to display on your 37 waffle
> plots:
>
> library(plotrix)
> # get an "alphabet" of colors
> alphacol<-rainbow(18)
> # the actual values in the plotted matrix don't matter
> fakemat<-matrix(1:100,nrow=10)
> # pick off the positions one by one
> for(pos in 1:37) {
>  posdf<-zjdat[zjdat$position == pos,]
>  for(res in 1:dim(posdf)[1]) {
>   if(res == 1)
>    rescol<-rep(alphacol[as.numeric(posdf$residue[res])],
>    posdf$ratio[res])
>   else
>    rescol<-c(rescol,rep(alphacol[as.numeric(posdf$residue[res])],
>    posdf$ratio[res]))
>  }
>  if(!is.null(resmat<-fill.corner(rescol,10,10)))
>   color2D.matplot(fakemat,border="lightgray",cellcolors=resmat,
>    yrev=FALSE,main=c(pos,length(resmat)))
> }
>
> That might get you started. In fact, I might even write a waffle plot
> function for plotrix.
>
> Jim
>
>
>


-- 
Zhao JIN
Ph.D. Candidate
Ruth Ley Lab
467 Biotech
Field of Microbiology, Cornell University
Lab: 607.255.4954
Cell: 412.889.3675

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to