On Fri, Jul 1, 2011 at 12:47 PM, Bansal, Vikas <vikas.ban...@kcl.ac.uk> wrote: > Dear all, > > I am doing a project on variant calling using R.I am working on pileup > file.There are 10 columns in my data frame and I want to count the number of > A,C,G and T in each row for column 9.example of column 9 is given below- > > .a,g,, > .t,t,, > .,c,c, > .,a,,, > .,t,t,t > .c,,g,^!. > .g,ggg.^!, > .$,,,,,., > a,g,,t, > ,,,,,.,^!. > ,$,,,,.,. > > This is a bit confusing for me as these characters are in one column and how > can we scan them for each row to print number of A,C,G and T for each row. > Most of the rows have . and , and other symbols but we > will ignore them.I just want to run a loop with a counter which will count > the number of A,C,G and T for each row and will give output something like > this- > > > A C G T > 1 0 1 0 > 0 0 0 2 > 0 2 0 0 > 1 0 0 0 > 0 0 0 3 > > This output is for first 5 rows from the example given above. >
Read the lines into L and then remove all but each of a, c, g and t computing the number of characters in the remaining character strings: Lines <- ".a,g,, .t,t,, .,c,c, .,a,,, .,t,t,t .c,,g,^!. .g,ggg.^!, .$,,,,,., a,g,,t, ,,,,,.,^!. ,$,,,,.,." L <- readLines(textConnection(Lines)) data.frame(a = nchar(gsub("[^a]", "", L)), c = nchar(gsub("[^c]", "", L)), g = nchar(gsub("[^g]", "", L)), t = nchar(gsub("[^t]", "", L)) ) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.