Re: [R] For help in R coding

David Winsemius Sun, 03 Jul 2011 12:59:23 -0700


On Jul 3, 2011, at 1:07 PM, Bansal, Vikas wrote:

Yes you are right. unlist operation is unnecessary and I have triedit yesterday and it is working without that operation also.But Ihave one more problem on which I have worked whole day but did notget any solution.As I told you I am new to R,I want to ask that howI can use the (if condition) in the following code
df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character")
txtvec <- readLines(textConnection(df[,9]))
dad=data.frame(A = (sapply(gregexpr("A|a", (df[,9])), function(x) if( x[[1]] != -1)
length(x) else 0 )),
C = (sapply(gregexpr("C|c", (df[,9])), function(x) if ( x[[1]] != -1)
length(x) else 0 )),
G = (sapply(gregexpr("G|g", (df[,9])), function(x) if ( x[[1]] != -1)
length(x) else 0 )),
T = (sapply(gregexpr("T|t", (df[,9])), function(x) if ( x[[1]] != -1)
length(x) else 0 )),
N = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1)
length(x) else 0 )))
Now my problem is in my data frame I have alphabets A,C,G and T in3rd column also.Now these commas (,)and dots(.) in column 9 are forthese alphabets which are in column 3.I want to use if conditionlike this
if in my dataframe column 3 have A then A = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1)length(x) else 0 ))) else (A = (sapply(gregexpr("A|a", (df[,9])),function(x) if ( x[[1]] != -1)length(x) else 0 )),if in my dataframe column 3 haveCA then C =(sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1)length(x) else 0 ))) else C = (sapply(gregexpr("C|c", (df[,9])),function(x) if ( x[[1]] != -1)length(x) else 0 )), if in my dataframe column 3 have G then G =(sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1)length(x) else 0 ))) else G = (sapply(gregexpr("G|g", (df[,9])),function(x) if ( x[[1]] != -1)length(x) else 0 )) if in my dataframe column 3 have T then T =(sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1)length(x) else 0 ))) else T = (sapply(gregexpr("T|t", (df[,9])),function(x) if ( x[[1]] != -1)
length(x) else 0 )),


I finally figured out that you wanted this:

> dat$newcol <- apply(dat, 1, function(x) gsub("\\,|\\.", x[3],x[9]) )

# So that replaces any instance of "," or "." in col9 with the letterin col3

# Then the same old routine as yesterday

> dat$A <- sapply(gregexpr("A|a", (dat[,"newcol"])), function(x) if( x[[1]] != -1) length(x) else 0 )> dat$C <- sapply(gregexpr("C|c", (dat[,"newcol"])), function(x) if( x[[1]] != -1) length(x) else 0 )> dat$G <- sapply(gregexpr("G|g", (dat[,"newcol"])), function(x) if( x[[1]] != -1) length(x) else 0 )> dat$T <- sapply(gregexpr("T|t", (dat[,"newcol"])), function(x) if( x[[1]] != -1) length(x) else 0 )


> dat[, c("A","C", "G", "T")]
   A C G T
1  1 0 1 4
2  4 0 0 2
3  4 2 0 0
4  1 5 0 0
5  0 0 4 3
6  5 1 1 0
7  4 0 4 0
8  8 0 0 0
9  1 4 1 1
10 0 0 0 8
11 0 0 0 8


So I want to code so that it will give the output like this-

DATA FRAME (Input)

  col3                 col 9
   T                      .a,g,,
   A                    .t,t,,
   A                    .,c,c,
   C                     .,a,,,
   G                     .,t,t,t
   A                     .c,,g,^!.
   A                      .g,ggg.^!,
   A                      .$,,,,,.,
   C                      a,g,,t,
   T                      ,,,,,.,^!.
   T                       ,$,,,,.,."


output

A            C                 G                        T
1             0                  1                        4
4             0                  0                        2
4              2                 0                        0
1              5                 0                        0
0              0                 4                        3



This is the output for first five rows.

Can you please help me how to use this if condition in your codingor we can also do it by using some other condition rather than ifcondition?













________________________________________
From: David Winsemius [dwinsem...@comcast.net]
Sent: Sunday, July 03, 2011 3:57 AM
To: Bansal, Vikas
Cc: Dennis Murphy; r-help@r-project.org
Subject: Re: [R] For help in R coding

On Jul 2, 2011, at 4:46 PM, Bansal, Vikas wrote:

DEAR ALL,
I TRIED THIS CODE AND THIS IS RUNNING PERFECTLY...

df=read.table("Case2.pileup",fill=T,sep="\t",colClasses ="character")

txt=df[,9]
txtvec <- readLines(textConnection(txt))
dad=data.frame(A = unlist(sapply(gregexpr("A|a", txtvec),
function(x) if ( x[[1]] != -1)
length(x) else 0 )),
C = unlist(sapply(gregexpr("C|c", txtvec), function(x) if ( x[[1]] !
= -1)
length(x) else 0 )),
G = unlist(sapply(gregexpr("G|g", txtvec), function(x) if ( x[[1]] !
= -1)
length(x) else 0 )),
T = unlist(sapply(gregexpr("T|t", txtvec), function(x) if ( x[[1]] !
= -1)
length(x) else 0 )),
N = unlist(sapply(gregexpr("\\,|\\.", txtvec), function(x) if
( x[[1]] != -1)
length(x) else 0 )))


The unlist operation is unnecessary since the sapply operation returns
a vector.  (It doesn't hurt, but it is unnecessary.)





Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
________________________________________
From: David Winsemius [dwinsem...@comcast.net]
Sent: Saturday, July 02, 2011 9:04 PM
To: Dennis Murphy
Cc: r-help@r-project.org; Bansal, Vikas
Subject: Re: [R] For help in R coding

On reflection and a bit of testing I think the best approach would be

to use gregexpr. For counting the number of commas, this appearsquite

straightforward.

sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1)

length(x) else 0 )
[1] 3 3 3 4 3 3 2 6 4 6 6

It easily generalizes to period and the `|` (or) operation onletters.

( did need to add the check since the length of gregexpr is always at
least one but ihas value -1 when there is no match

sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1)

length(x) else 0 )
[1] 0 2 0 0 3 0 0 0 1 0 0


On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote:

Hi:

There seems to be a problem if the string ends in , or . , which
makes
it difficult for strsplit() to pick up if it is splitting on those
characters. Here is an alternative, splitting on individual
characters
and using charmatch() instead:

charsum <- function(s, char) {
 u <- strsplit(s, "")
 sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE)
}

unname(sapply(txtvec, function(x) charsum(x, ',')))
unname(sapply(txtvec, function(x) charsum(x, '.')))

Putting this into a data frame,

dfout <- data.frame(periods = unname(sapply(txtvec, function(x)
charsum(x, '.'))),
                             commas = unname(sapply(txtvec,
function(x) charsum(x, '.'))) )
txtvec

HTH,
Dennis

On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius <dwinsem...@comcast.net

wrote:

On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote:

Dear all,

I am doing a project on variant calling using R.I am working on
pileup file.There are 10 columns in my data frame and I want to
count the number of A,C,G and T in each row for column 9.example
of
column 9 is given below-

      .a,g,,
      .t,t,,
      .,c,c,
      .,a,,,
      .,t,t,t
      .c,,g,^!.
      .g,ggg.^!,
      .$,,,,,.,
      a,g,,t,
      ,,,,,.,^!.
      ,$,,,,.,.

This is a bit confusing for me as these characters are in one
column
and how can we scan them for each row to print number of A,C,G
and T
for each row.


Seems a bit clunky but this does the job (first the data):


txt <- " .a,g,,


+            .t,t,,
+            .,c,c,
+            .,a,,,
+            .,t,t,t
+            .c,,g,^!.
+            .g,ggg.^!,
+            .$,,,,,.,
+            a,g,,t,
+            ,,,,,.,^!.
+            ,$,,,,.,."

txtvec <- readLines(textConnection(txt))


Now the clunky solution, Basically subtracts 1 from the counts of
"fragments" that result from splitting on each letter in turn.
Could
be made prettier with a function that did the job.

data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,


split="a"), length) , "-", 1)),
+ C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
length) , "-", 1)),
+ G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
length) , "-", 1)),
+ T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
length) , "-", 1)) )
                  A C G T
.a,g,,               1 0 1 0
       .t,t,,     0 0 0 2
       .,c,c,     0 2 0 0
       .,a,,,     1 0 0 0
       .,t,t,t    0 0 0 2
       .c,,g,^!.  0 1 1 0
       .g,ggg.^!, 0 0 4 0
       .$,,,,,.,  0 0 0 0
       a,g,,t,    1 0 1 1
       ,,,,,.,^!. 0 0 0 0
       ,$,,,,.,.  0 0 0 0

Has the advantage that the input data ends up as rownames, which
was a
surprise.

If you wanted to count "A" and "a" as equivalent, then the split
argument should be "a|A"

AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT
LIKE
THIS.


BUT CAN I COUNT . AND , ALSO USING-
data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
split=".|,"), length) , "-", 1)),

I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME
PLACES
IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN
CALCULATING AND JUST SHOWING 0.


You need to use valid regex expressions for 'split'. Since "." and
"," are
special characters they need to be escaped when you wnat the
literals to be
recognized as such.

I haven't figured out why but you need to drop the final operation
of
subtracting 1 from the values when counting commas:

data.frame(periods = unlist(lapply( lapply( sapply(txtvec,strsplit,

                          split="\\."), length) , "-", 1))
,commas = unlist( lapply( sapply(txtvec, strsplit,
                          split="\\,"), length) ) )
                    periods commas
.a,g,,                      1      3
         .t,t,,           1      3
         .,c,c,           1      3
         .,a,,,           1      4
         .,t,t,t          1      4
         .c,,g,^!.        1      4
         .g,ggg.^!,       2      2
         .$,,,,,.,        2      6
         a,g,,t,          0      4
         ,,,,,.,^!.       1      7
         ,$,,,,.,.        1      7

--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For help in R coding

Reply via email to