On Fri, Jul 1, 2011 at 12:47 PM, Bansal, Vikas <vikas.ban...@kcl.ac.uk> wrote:
> Dear all,
>
> I am doing a project on variant calling using R.I am working on pileup 
> file.There are 10 columns in my data frame and I want to count the number of 
> A,C,G and T in each row for column 9.example of column 9 is given below-
>
>            .a,g,,
>            .t,t,,
>            .,c,c,
>            .,a,,,
>            .,t,t,t
>            .c,,g,^!.
>            .g,ggg.^!,
>            .$,,,,,.,
>            a,g,,t,
>            ,,,,,.,^!.
>            ,$,,,,.,.
>
> This is a bit confusing for me as these characters are in one column and how 
> can we scan them for each row to print number of A,C,G and T for each row.
> Most of the rows have      .         and      ,    and other symbols but we 
> will ignore them.I just want to run a loop with a counter which will count 
> the number of A,C,G and T for each row and will give output something like 
> this-
>
>
> A   C   G  T
> 1   0   1  0
> 0   0   0  2
> 0   2   0  0
> 1   0   0  0
> 0   0   0  3
>
> This output is for first 5 rows from the example given above.
>

Read the lines into L and then remove all but each of a, c, g and t
computing the number of characters in the remaining character strings:

Lines <- ".a,g,,
.t,t,,
.,c,c,
.,a,,,
.,t,t,t
.c,,g,^!.
.g,ggg.^!,
.$,,,,,.,
a,g,,t,
,,,,,.,^!.
,$,,,,.,."

L <- readLines(textConnection(Lines))

data.frame(a = nchar(gsub("[^a]", "", L)),
        c = nchar(gsub("[^c]", "", L)),
        g = nchar(gsub("[^g]", "", L)),
        t = nchar(gsub("[^t]", "", L))
)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to