Hi Dan, Thanks a lot for this! It works but I have some minor issues. Let's take the working example for instance, which x is of 100 rows and y of 120 rows.
After merge(), z is of 120 rows and is sorted by the ascending order of "SEARCH_KEY1". The following is what I got out of the working example: > z[1:4,1:3] SEARCH_KEY1 PROBE_ID1 PROBE_ID2 1 1 0.02816032 0.3202740 2 2 -1.96321867 -0.7636817 3 3 0.01571277 -0.9202248 4 4 0.39734092 0.4306223 By any chance, I could get a "merged" z but it does not contain any "SEARCH_KEY" inherited from data frame "y"? Because ideally, I just want to substitute x$PROBE_ID1 with y$PROBE_ID2 based on the common "SEARCH_KEY" (meaning when "x$SEARCH_KEY1=y$SEARCH_KEY2") but I don't want to increase the rows of x. Any thoughts on this? I appreciate your help and have a good evening! Best, Allen On Wed, Oct 8, 2008 at 6:00 PM, Daniel Malter <[EMAIL PROTECTED]> wrote: > Hi Allen, look at the following working example: > > x=data.frame(rnorm(100),1:100) > names(x)=c("PROBE_ID1","SEARCH_KEY1") > y=data.frame(rnorm(120),1:120) > names(y)=c("PROBE_ID2","SEARCH_KEY2") > > z=merge(x,y,by.x="SEARCH_KEY1",by.y="SEARCH_KEY2",all.x=T,all.y=T) > z # check the resulting data frame it should have 100 rows and contain > SEARCH_KEY1, PROBE_ID1, and PROBE_ID2 > > That is, you have to put SEARCH_KEY1 and SEARCH_KEY2 in quotes, which I > forgot in my previous email. > > Cheers, > Daniel > > > ------------------------- > cuncta stricte discussurus > ------------------------- > > > ------------------------------ > *Von:* ss [mailto:[EMAIL PROTECTED] > *Gesendet:* Wednesday, October 08, 2008 5:00 PM > *An:* Daniel Malter > *Cc:* R help > *Betreff:* Re: [R] How to join the two tables based on one overlapped > column > > > Dear Daniel, > > Thank you very much for the help! > > I tried the code and got the following: > > > > John<-read.table(file="John_probe.txt",header=TRUE,row.names=NULL,fill=TRUE) > > > Susan<-read.table(file="Susan_probe.txt",header=TRUE,row.names=NULL,fill=TRUE) > > dim(John) > [1] 48701 2 > > dim(Susan) > [1] 46713 2 > > dataToMerge=data.frame(John$PROBE_ID2, John$SEARCH_KEY2) > > > mergedData=merge(Susan,dataToMerge,by.x=SEARCH_KEY1,by.y=SEARCH_KEY2,all.x=T,all.y=F) > Error in fix.by(by.x, x) : object "SEARCH_KEY1" not found > > > > > I modified the last one by specifying the file name and got : > > > > mergedData=merge(Susan,dataToMerge,by.x=Susan$SEARCH_KEY1,by.y=John$SEARCH_KEY2,all.x=T,all.y=F) > Error in fix.by(by.x, x) : 'by' must specify valid column(s) > > > > Have you got any clue about this? > > Thanks much, > Allen > > > On Wed, Oct 8, 2008 at 11:46 AM, Daniel Malter <[EMAIL PROTECTED]> wrote: > >> dataToMerge=data.frame(yourtablename2$PROBE_ID2, >> yourtablename2$SEARCH_KEY2) >> ##Puts the two columns of interest in dataset 2 in a separate data frame. >> >> >> mergedData=merge(yourtablename1,dataToMerge,by.x=SEARCH_KEY1,by.y=SEARCH_KEY >> 2,all.x=T,all.y=F) >> ##merges the first table with the data frame just created looking for >> matches between SEARCH_KEY1 and SEARCH_KEY2, all entries in dataset 1 are >> retained (whether matched or not), entries in the dataToMerge dataframe >> that >> do not match any entries in dataset 1 are dropped. >> >> You then have an additional column in "mergedData" that contains the >> PROBE_ID2 and you can just assign them to PROBE_ID1 (i.e. replace >> PROBE_ID1 >> by the values in this column). >> >> Cheers, >> Daniel >> >> ------------------------- >> cuncta stricte discussurus >> ------------------------- >> >> -----Ursprüngliche Nachricht----- >> Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >> Im >> Auftrag von ss >> Gesendet: Wednesday, October 08, 2008 11:33 AM >> An: R help >> Betreff: [R] How to join the two tables based on one overlapped column >> >> Dear list, >> >> I need some clues on this. I have two excel files and I basically want to >> map one to the other one. Can you give me some hints how to do it? >> >> The first excel file, named as "Susan_probe.xls", there are two columns, >> "PROBE_ID1" and "SEARCH_KEY1" >> >> PROBE_ID1 SEARCH_KEY1 ILMN_30212 ILMN_30212 ILMN_1285 ILMN_1285 >> ILMN_137964 ILMN_137964 ILMN_138109 ILMN_138109 ... >> >> The second excel file, named as "John_probe.xls", there are two columns as >> well, "PROBE_ID2" and "SEARCH_KEY2". >> >> PROBE_ID2 SEARCH_KEY2 ILMN_1809034 ILMN_16367 ILMN_1660305 ILMN_16583 >> ILMN_1792173 ILMN_19158 ... >> >> There are 46713 rows in the first excel file and 49702 rows in the second >> file. >> >> Probes in the first columns of two excel files are different but they can >> be >> matched based on the second column "SEARCH_KEY". So what I want to do is >> to >> substitute the "PROBE_ID1" in the "Susan_probe.xls" file with the >> "PROBE_ID2" in the "John_probe.xls" based on their common "SEARCH_KEY". >> >> Thank you so much for your help. I really appreciate. >> >> All the best, >> Allen >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.