Sorry. Typo in my previous. Should be: > sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum))) $X1 L R T 0.91491320 0.03675651 0.04833030
$X2 E M 0.9827278 0.0172722 $X3 N Y 0.0483303 0.9516697 $X4 I L Q 0.8976410 0.0850868 0.0172722 $X5 I V 0.9516697 0.0483303 $X6 P S 0.96324349 0.03675651 $X7 D E G 0.8976410 0.0540287 0.0483303 $X8 A C 0.9827278 0.0172722 On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <bgun...@gene.com> wrote: > OK, I admit it: I re-read what you wrote and now I'm confused. Is: > >> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x))) > > X1 X2 X3 X4 X5 X6 X7 X8 > [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2 > [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2 > [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4 > [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2 > > what you want? > > -- Bert > On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <bgun...@gene.com> wrote: >> The OP's request is a bit ambiguous to me: at a given residue, do you >> wish to calculate the proportions for only those amino acids that >> appear at that residue, or do you wish to include the proportions for >> all amino acids, some of which might then be 0. >> >> Assuming the former, then I don't think one needs to go to the lengths >> described by John below. >> >> Using your example (thanks!), the following seems to suffice: >> >>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x))) >> >> $X1 >> x >> L R T >> 0.50 0.25 0.25 >> >> $X2 >> x >> E M >> 0.75 0.25 >> >> $X3 >> x >> N Y >> 0.25 0.75 >> >> $X4 >> x >> I L Q >> 0.25 0.50 0.25 >> >> $X5 >> x >> I V >> 0.75 0.25 >> >> $X6 >> x >> P S >> 0.75 0.25 >> >> $X7 >> x >> D E G >> 0.25 0.50 0.25 >> >> $X8 >> x >> A C >> 0.75 0.25 >> >> >> This could, of course, then be modified to add zero proportions for >> all non-appearing amino acids. >> >> -- Cheers, >> Bert >> >> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <jrkrid...@inbox.com> wrote: >>> >>> I think this does what you want using two packages, plyr and reshape2 >>> that >>> you may have to install. If so install.packages("plyr", "reshape2") >>> should >>> do the trick. >>> library(plyr) >>> library(reshape2) >>> # using supplied file 'myfile" from below >>> time0total = sum(myfile[,2]) >>> mydata <- myfile[, 2:10] >>> md1 <- melt(mydata, id = "Time_zero") >>> ddply(md1, .(variable, value), summarise, sum = >>> sum(Time_zero)/time0total) >>> >>> >>> John Kane >>> Kingston ON Canada >>> >>> -----Original Message----- >>> From: z...@cornell.edu >>> Sent: Tue, 24 Jul 2012 10:25:21 -0400 >>> To: jrkrid...@inbox.com >>> Subject: Re: [R] How to do the same thing for all levels of a column? >>> >>> Hi John, >>> Thank you for the tips. My apologies about the unreadable sample data... >>> So here is the output of the sample data, and hopefully it works this >>> time >>> :) >>> myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", >>> "p2", >>> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, >>> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", >>> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L >>> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, >>> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = >>> structure(c(1L, >>> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = >>> structure(c(1L, >>> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = >>> structure(c(1L, >>> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = >>> structure(c(1L, >>> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = >>> structure(c(1L, >>> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = >>> c("Proteins", >>> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = >>> c(NA, >>> 4L), class = "data.frame") >>> And here is my original question: >>> Basically, I have a bunch of protein sequences composed of different >>> amino >>> acid residues, and each residue is represented by an uppercase letter. I >>> want to calculate the ratio of different amino acid residues at each >>> position of the proteins. >>> >>> If I name this table as myfile.txt, I have the following scripts to >>> calculate the ratio of each amino acid residue at position 1: >>> >>> # showing levels of the 3rd column, which means the types of residues >>> >>> >myfile[,3] >>> >>> >>> # calculating the ratio of L >>> >>> >list=c(which(myfile[,3]=="L")) >>> >>> >time0total=sum(myfile[,2]) >>> >>> >AA_L=0 >>> >>> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>> >>> >ratio_L=AA_L/time0total >>> >>> >>> So how can I write a script to do the same thing for the other two >>> levels (T >>> and R) in column 3, and also do this for every column that contains amino >>> acid residues? >>> >>> Thanks a lot! >>> >>> Regards, >>> >>> Zhao >>> 2012/7/24 John Kane <[1]jrkrid...@inbox.com> >>> >>> First thing is to supply the data in a useable format. As is it is >>> essenatially unreadable. All R-beginners do this. :) >>> Have a look at the dput function (?dput) for a good way to supply >>> sample >>> data in an email. >>> If you have a large dataset probably a few dozen lines of data would be >>> fine. >>> Something like dput(head(mydata)) should be fine. Just copy and paste >>> the >>> output into your email. >>> Welcome to R. I think you will like it. >>> John Kane >>> Kingston ON Canada >>> >>> > -----Original Message----- >>> > From: [2]z...@cornell.edu >>> > Sent: Mon, 23 Jul 2012 18:01:11 -0400 >>> > To: [3]r-help@r-project.org >>> > Subject: [R] How to do the same thing for all levels of a column? >>> > >>> > Dear all, >>> > >>> > >>> > >>> > I am a R beginner, and I am looking for a way to do the same thing for >>> > all >>> > levels of a column in a table. >>> > >>> > >>> > >>> > Basically, I have a bunch of protein sequences composed of different >>> > amino >>> > acid residues, and each residue is represented by an uppercase letter. >>> I >>> > want to calculate the ratio of different amino acid residues at each >>> > position of the proteins. Here is an example table: >>> > >>> > Proteins >>> > >>> > Time_zero >>> > >>> > 1 >>> > >>> > 2 >>> > >>> > 3 >>> > >>> > 4 >>> > >>> > 5 >>> > >>> > 6 >>> > >>> > 7 >>> > >>> > 8 >>> > >>> > p1 >>> > >>> > 0.0050723 >>> > >>> > L >>> > >>> > E >>> > >>> > Y >>> > >>> > I >>> > >>> > I >>> > >>> > P >>> > >>> > D >>> > >>> > A >>> > >>> > p2 >>> > >>> > 0.0002731 >>> > >>> > T >>> > >>> > E >>> > >>> > N >>> > >>> > L >>> > >>> > V >>> > >>> > P >>> > >>> > G >>> > >>> > A >>> > >>> > p3 >>> > >>> > 9.757E-05 >>> > >>> > L >>> > >>> > M >>> > >>> > Y >>> > >>> > Q >>> > >>> > I >>> > >>> > P >>> > >>> > E >>> > >>> > C >>> > >>> > p4 >>> > >>> > 0.0002077 >>> > >>> > R >>> > >>> > E >>> > >>> > Y >>> > >>> > L >>> > >>> > I >>> > >>> > S >>> > >>> > E >>> > >>> > A >>> > >>> > >>> > >>> > If I name this table as myfile.txt, I have the following scripts to >>> > calculate the ratio of each amino acid residue at position 1: >>> > >>> > # showing levels of the 3rd column, which means the types of residues >>> > >>> > >myfile[,3] >>> > >>> > >>> > >>> > # calculating the ratio of L >>> > >>> > >list=c(which(myfile[,3]=="L")) >>> > >>> > >time0total=sum(myfile[,2]) >>> > >>> > >AA_L=0 >>> > >>> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>> > >>> > >ratio_L=AA_L/time0total >>> > >>> > >>> > >>> > So how can I write a script to do the same thing for the other two >>> levels >>> > (T and R) in column 3, and also do this for every column that contains >>> > amino acid residues? >>> > >>> > >>> > >>> > Many thanks for any help you could give me on this topic! :) >>> > >>> > >>> > >>> > Regards, >>> > >>> > Zhao >>> > -- >>> > Zhao JIN >>> > Ph.D. Candidate >>> > Ruth Ley Lab >>> > 467 Biotech >>> > Field of Microbiology, Cornell University >>> > Lab: 607.255.4954 >>> > Cell: 412.889.3675 >>> > >>> >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > [4]R-help@r-project.org mailing list >>> > [5]https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> > [6]http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> ____________________________________________________________ >>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on >>> your desktop! >>> Check it out at [7]http://www.inbox.com/marineaquarium >>> >>> -- >>> Zhao JIN >>> Ph.D. Candidate >>> Ruth Ley Lab >>> 467 Biotech >>> Field of Microbiology, Cornell University >>> Lab: 607.255.4954 >>> Cell: 412.889.3675 >>> _________________________________________________________________ >>> >>> [8]3D Earth Screensaver Preview >>> Free 3D Earth Screensaver >>> Watch the Earth right on your desktop! Check it out at >>> [9]www.inbox.com/earth >>> >>> References >>> >>> 1. mailto:jrkrid...@inbox.com >>> 2. mailto:z...@cornell.edu >>> 3. mailto:r-help@r-project.org >>> 4. mailto:R-help@r-project.org >>> 5. https://stat.ethz.ch/mailman/listinfo/r-help >>> 6. http://www.R-project.org/posting-guide.html >>> 7. http://www.inbox.com/marineaquarium >>> 8. http://www.inbox.com/earth >>> 9. http://www.inbox.com/earth >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.