You got the error. It is different naming convention of chr. I should be able to fix that pretty easily.
In case the problem persists, I will contact the list. Thanks! -Abhi On Tue, Apr 6, 2010 at 5:01 PM, David Winsemius <dwinsem...@comcast.net>wrote: > OK, not the SNP's. So look at the "chr"'s. I will bet that you get 0 when > you try : > > length(intersect(data_lane6_snps$chr, data_lane6_snps_rsid$chr)) > > > ... since one is using a format of "chrNN" and the other is using just > "NN". You need to get the chromosome naming convention straightened out. > > -- > David. > > > On Apr 6, 2010, at 4:53 PM, Abhishek Pratap wrote: > > Just so you know >> >> length(intersect(data_lane6_snps$SNP, data_lane6_snps_rsid$SNP)) >> 796120 >> >> I just need to include the chr condition now where I am stuck. >> >> -Abhi >> >> On Tue, Apr 6, 2010 at 4:51 PM, Abhishek Pratap <abhishek....@gmail.com> >> wrote: >> Hi David >> >> I can understand looking the SNP data values it can be felt that they are >> different values and hence no result in merge. However the columns still >> have ~700K SNPs common. What I am looking for is a merge where the SNP and >> Chr matches. If I match only the SNP column I get partially correct results >> since it is possible for two chromosomes to have a SNP at the same bp >> location so the merge needs to take both SNP position and Chromosome into >> account. >> >> Thanks! >> -Abhi >> >> >> On Tue, Apr 6, 2010 at 4:42 PM, David Winsemius <dwinsem...@comcast.net> >> wrote: >> >> On Apr 6, 2010, at 4:03 PM, Abhishek Pratap wrote: >> >> Hi David >> >> Here it is. You can ignore the bio jargon if it sounds confusing. >> >> Sometimes it is essential to have domain details. >> >> >> The corresponding data type of column (SNP, chr) on which I am applying >> merge is same. >> >> merge(data_lane6_snps, data_lane6_snps_rsid , by = c("SNP,"chr")) >> >> >> str(data_lane6_snps) >> 'data.frame': 7724462 obs. of 10 variables: >> $ chr : Factor w/ 25 levels "chr1","chr10",..: 1 1 1 1 1 1 1 1 >> 1 1 ... >> $ SNP : int 100 101 103 108 179 180 191 197 218 222 ... >> $ reference : Factor w/ 5 levels "A","C","G","N",..: 2 2 5 2 2 5 2 2 >> 1 5 ... >> $ genotype : Factor w/ 10 levels "A","C","G","K",..: 1 1 1 8 2 2 3 8 >> 2 2 ... >> $ consensus_qual: int 0 0 0 4 33 33 19 19 19 19 ... >> $ snp_qual : int 0 0 0 4 0 33 19 19 19 19 ... >> $ rms_qual : int 0 0 0 0 21 21 21 21 21 21 ... >> $ depth : int 1 1 1 1 2 2 2 2 2 2 ... >> $ bases : Factor w/ 453774 levels "^!,","^!,^!,",..: 5 5 5 410998 >> 49793 155731 284998 416878 133393 133393 ... >> $ base_quality : Factor w/ 555104 levels "`","``","```",..: 359 359 359 >> 54813 92856 92856 92856 92856 92539 55424 ... >> >> > str(data_lane6_snps_rsid) >> 'data.frame': 797807 obs. of 4 variables: >> $ chr : Factor w/ 24 levels "1","10","11",..: 3 3 3 3 3 3 3 3 3 3 ... >> $ SNP : int 68143872 11071026 69423434 12394791 1302846 95330693 3921381 >> 57122299 41899656 76990037 ... >> >> Looking at this line and the line for "SNP" in the above dataframe I am >> not seeing that these are exhibiting much similarity in range. There are 10 >> times few observations. What was you plan for the non-matching cases? Did >> you really mean that you wanted a right outer join? >> >> You might get information by trying: >> >> length(intersect(data_lane6_snps$SNP, data_lane6_snps_rsid$SNP)) >> >> That would tell you how many potential matches you might have on the basis >> of SNP numbers, Although an SNP match might or might not be a full match >> given the chr matching that is also being specified. >> >> >> >> $ end : int 68143872 11071026 69423434 12394791 1302846 95330693 3921381 >> 57122299 41899656 76990037 ... >> $ rsid: Factor w/ 797807 levels "rs10","rs10000010",..: 100229 685690 >> 505395 470219 780326 29342 29263 327909 434159 723152 ... >> >> >> On Tue, Apr 6, 2010 at 3:59 PM, David Winsemius <dwinsem...@comcast.net> >> wrote: >> >> On Apr 6, 2010, at 3:54 PM, Abhishek Pratap wrote: >> >> Hi Guys >> >> I have two data frames which I would like to merge on two conditions. >> >> I am doing the following (abstract form) >> >> new.data.frame <- merge(df1,df2, by=c("Col1","Col2")) >> >> So I am guessing that you really wanted just this: >> >> new.data.frame <- merge(df1,df2) >> >> ?merge >> >> Since the default for merge is: by = intersect(names(x), names(y)), this >> would have been equivalent to >> >> new.data.frame <- merge(df1,df2, by=c("chr", "SNP") ) >> >> See above regarding the possibility that you have non-congruent SNP >> labeling problems. >> >> >> >> >> >> What does >> >> str(df1) ; str(df2) >> >> ... show? >> >> >> >> It is giving me a null result. >> >> Basically I need to apply two conditions. >> >> I also tried sqldf but it is running forever. Will indexing help ? >> >> temp <- sqldf("select a.chr,a.SNP,a.snp_qual,a.rms_qual,a.depth,b.rsid >> FROM >> + data_lane6_snps a, >> + data_lane6_snps_rsid b >> + WHERE >> + a.SNP = b.SNP >> + AND >> + a.chr = b.chr >> + ") >> >> Thanks! >> -Abhi >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> West Hartford, CT >> >> >> >> David Winsemius, MD >> West Hartford, CT >> >> >> >> > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.