Hi Jim, thanks for getting back to me. Can you please confirm if you can see this plot in attach?
Thanks Ana On Thu, Jan 23, 2020 at 8:06 PM Jim Lemon <drjimle...@gmail.com> wrote: > > Hi Ana, > You seem to be working on an identification or classification problem. > Your sample plot didn't come through, perhaps try converting it to a > PDF or PNG. > I may be missing something, but I can't see how randomly selecting 30 > values from almost 4 million is going to mean anything in terms of > statistical significance. I hope you will pardon me for saying that it > looks like a "p-trawl". It is easy to select cases where the p-value > is less than 0.05: > > a[a$pvalue < 0.05,] > > Maybe what you want to do is display this subset of your data as > candidates for a match among the very large number of non-matches. > Let's do a bit of damage to your sample data and add the proportions: > > a<-read.table(text="rs pvalue pSNP > rs185642176 0.0267407 0.6 > rs184120752 0.0787681 0.3 > rs10904045 0.0508162 0.4 > rs35849539 0.0875910 0.2 > rs141633513 0.0787759 0.2 > rs4468273 0.0542171 0.4 > rs4567378 0.0539484 0.4 > rs7084251 0.0126445 0.7 > rs181605000 0.0787838 0.35 > rs12255619 0.0192719 0.61 > rs140367257 0.0788008 0.25 > rs10904178 0.0969814 0.16 > rs7918960 0.0436341 0.45 > rs61688896 0.0526256 0.39 > rs151283848 0.0787284 0.34 > rs140174295 0.0989107 0.11 > rs145945079 0.0787015 0.23 > rs4881370 0.0455089 0.51 > rs183895035 0.0787015 0.22 > rs181749526 0.0787015 0.22", > header=TRUE,stringsAsFactors=FALSE) > alt05<-a[a$pvalue < 0.05,] > library(plotrix) > segmat<-matrix(c(alt05$pSNP,alt05$pSNP-0.1,alt05$pSNP+0.1,rep(1,5)), > nrow=4,byrow=TRUE) > rownames(segmat)<-c("prop","lower","upper","N") > centipede.plot(segmat,mar=c(4,6,3,4), > main="Proportion of SNPs", > left.labels=alt05$rs,right.labels=rep("",5)) > > This is probably not what you want, but it is a start. > > Jim > > On Fri, Jan 24, 2020 at 7:08 AM Ana Marija <sokovic.anamar...@gmail.com> > wrote: > > > > Hello, > > > > I have a data frame which looks like this: > > > > > head(a,20) > > rs pvalue > > 1: rs185642176 0.267407 > > 2: rs184120752 0.787681 > > 3: rs10904045 0.508162 > > 4: rs35849539 0.875910 > > 5: rs141633513 0.787759 > > 6: rs4468273 0.542171 > > 7: rs4567378 0.539484 > > 8: rs7084251 0.126445 > > 9: rs181605000 0.787838 > > 10: rs12255619 0.192719 > > 11: rs140367257 0.788008 > > 12: rs10904178 0.969814 > > 13: rs7918960 0.436341 > > 14: rs61688896 0.526256 > > 15: rs151283848 0.787284 > > 16: rs140174295 0.989107 > > 17: rs145945079 0.787015 > > 18: rs4881370 0.455089 > > 19: rs183895035 0.787015 > > 20: rs181749526 0.787015 > > > dim(a) > > [1] 3859763 2 > > > > What I would like to do is to take random subsets of 30 of those rs > > throughout the dataframe and find out which subsets of those generated > > have FDR value <0.05 > > > > FDR I would calculate I guess with: > > a$fdr=p.adjust(a$pvalue,method="BH") > > > > but I also guess I would be calculating only FDR for a particular > > subset of 30 randomly chosen rs, not for the whole data set. > > > > The result I would like to present like in the attached plot. The > > x-axis say proportion of SNPs and in my case SNP is equivalent to rs > > > > Can you please help with this, I really don't have idea how to go about > > this. > > > > Thanks > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.