Re: [R] Duplicated genes

arun Mon, 09 Sep 2013 17:43:56 -0700

Hi,

May be you can try this:
dat1New<-  dat1[!(duplicated(dat1$gene)|duplicated(dat1$gene,fromLast=TRUE)),]
dat2<-dat1[duplicated(dat1$gene)|duplicated(dat1$gene,fromLast=TRUE),]
 lst1<-split(dat2,dat2$gene)
dat3<-unsplit(lapply(lst1,function(x) {x1<- sum(apply(x[,6:32],2,function(y) 
y[1]>=y[2]));x2<- sum(apply(x[,6:32],2, function(y) y[1]<=y[2])); if(x1>x2) 
x[1,] else x[2,] } ),unique(dat2$gene)) #assuming that there are not more than 
2 copies of a particular gene. (In the dataset, it was not present)
 dat4<-rbind(dat1New,dat3)
dat5<-dat4[order(as.numeric(row.names(dat4))),]
 dim(dat5)
#[1] 639  32

A.K.

________________________________
From: Vivek Das <vd4mm...@gmail.com>
To: arun <smartpink...@yahoo.com> 
Sent: Monday, September 9, 2013 2:30 PM
Subject: Re: Duplicated genes

actually these are all differentially expressed genes. So the one with the most 
differentially expressed will be there in the list and its duplicate will be 
removed. Can you tell me again? I think then the script will change right?

----------------------------------------------------------

Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy

emails: vivek....@ieo.eu
            vchris...@yahoo.co.in
            vd4mm...@gmail.com

On Mon, Sep 9, 2013 at 8:27 PM, arun <smartpink...@yahoo.com> wrote:

Hi,
>Try:
>dat1<- read.table("DEGs_all.txt",sep="",header=TRUE,stringsAsFactors=FALSE)
>dim(dat1)
>#[1] 725  32
>length(unique(dat1$gene))
>#[1] 639
> dat2<-dat1[!duplicated(dat1$gene),]
> dim(dat2)
>#[1] 639  32
>
>dim(unique(dat1))
>#[1] 725  32
>
>The duplicated genes have different expression values.  You didn't provide 
>information on how to select those unique genes.  Here, the first row of every 
>duplicated gene will be selected and others are removed.
>
>But suppose, you want to get the mean values of those rows.
>library(plyr)
> res<-ddply(dat1[,c(1,6:32)],.(gene), numcolwise(mean,na.rm=TRUE))
>dim(res)
>#[1] 639  28
>
>A.K.
>
>
>
>
>
>
>
>________________________________
>From: Vivek Das <vd4mm...@gmail.com>
>To: arun <smartpink...@yahoo.com>
>Sent: Monday, September 9, 2013 1:35 PM
>Subject: Urgent help
>
>
>
>I have a data list with genes , I want to reduce the list to its unique genes. 
>The genes are having expression values but some of the genes are duplicates. 
>Is there any way where I can remove the duplicate names from the list and only 
>have the genes once with their corresponding values.Please see the attached 
>matrix.
>
>It will be nice if you can let me know. Its a bit urgent
>
>----------------------------------------------------------
>
>Vivek Das
>PhD Student in Computational Biology
>Giuseppe Testa's Lab
>European School of Molecular Medicine
>IFOM-IEO Campus
>Via Adamello, 16
>Milan, Italy
>
>emails: vivek....@ieo.eu
>            vchris...@yahoo.co.in
>            vd4mm...@gmail.com
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Duplicated genes

Reply via email to