Hi, I am given the following table: > head(hsa_refseq) chr genome region start stop nu strand nu.1 nu.2 gene_id 1 chr1 hg19_refGene CDS 67000042 67000051 0 + 0 gene_id NM_032291 2 chr1 hg19_refGene exon 66999825 67000051 0 + . gene_id NM_032291 3 chr1 hg19_refGene CDS 67091530 67091593 0 + 2 gene_id NM_032291 4 chr1 hg19_refGene exon 67091530 67091593 0 + . gene_id NM_032291 5 chr1 hg19_refGene CDS 67098753 67098777 0 + 1 gene_id NM_032291 6 chr2 hg19_refGene exon 67098753 67098777 0 + . gene_id NM_032291
What I've done is to find out how many of the elements on 3rd column are "CDS", "exon". sum(hsa_refseq$region=="CDS") sum(hsa_refseq$region=="exon") But what I would like is to print for each chromosome how many are exons and how many CDS. For example chr1 has 5 CDS and 2 exons chr2 has 10 CDS and 3 exons... Can you tell what should I add? Or if I am doing this wrong, how should I do it? Thank you, Regards, Nanami [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.