Re: [R] How to join the two tables based on one overlapped column

ss Wed, 08 Oct 2008 17:35:08 -0700

Hi Dan,

Thanks a lot for this! It works but I have some minor issues.
Let's take the working example for instance, which x is of
100 rows and y of 120 rows.


After merge(), z is of 120 rows and is sorted by the ascending order
of "SEARCH_KEY1".

The following is what I got out of the working example:

> z[1:4,1:3]
  SEARCH_KEY1   PROBE_ID1  PROBE_ID2
1           1  0.02816032  0.3202740
2           2 -1.96321867 -0.7636817
3           3  0.01571277 -0.9202248
4           4  0.39734092  0.4306223




By any chance, I could get a "merged" z but it does not contain
any "SEARCH_KEY" inherited from data frame "y"? Because ideally, I just
want to substitute x$PROBE_ID1 with y$PROBE_ID2 based on the
common "SEARCH_KEY" (meaning when "x$SEARCH_KEY1=y$SEARCH_KEY2")
but I don't want to increase the rows of x. Any thoughts on this?

I appreciate your help and have a good evening!

Best,
     Allen




On Wed, Oct 8, 2008 at 6:00 PM, Daniel Malter <[EMAIL PROTECTED]> wrote:

>  Hi Allen, look at the following working example:
>
> x=data.frame(rnorm(100),1:100)
> names(x)=c("PROBE_ID1","SEARCH_KEY1")
> y=data.frame(rnorm(120),1:120)
> names(y)=c("PROBE_ID2","SEARCH_KEY2")
>
> z=merge(x,y,by.x="SEARCH_KEY1",by.y="SEARCH_KEY2",all.x=T,all.y=T)
> z # check the resulting data frame it should have 100 rows and contain
> SEARCH_KEY1, PROBE_ID1,  and PROBE_ID2
>
> That is, you have to put SEARCH_KEY1 and SEARCH_KEY2 in quotes, which I
> forgot in my previous email.
>
> Cheers,
> Daniel
>
>
> -------------------------
> cuncta stricte discussurus
> -------------------------
>
>
>  ------------------------------
> *Von:* ss [mailto:[EMAIL PROTECTED]
> *Gesendet:* Wednesday, October 08, 2008 5:00 PM
> *An:* Daniel Malter
> *Cc:* R help
> *Betreff:* Re: [R] How to join the two tables based on one overlapped
> column
>
>
> Dear Daniel,
>
> Thank you very much for the help!
>
> I tried the code and got the following:
>
> >
> John<-read.table(file="John_probe.txt",header=TRUE,row.names=NULL,fill=TRUE)
> >
> Susan<-read.table(file="Susan_probe.txt",header=TRUE,row.names=NULL,fill=TRUE)
> > dim(John)
> [1] 48701     2
> > dim(Susan)
> [1] 46713     2
> > dataToMerge=data.frame(John$PROBE_ID2, John$SEARCH_KEY2)
> >
> mergedData=merge(Susan,dataToMerge,by.x=SEARCH_KEY1,by.y=SEARCH_KEY2,all.x=T,all.y=F)
> Error in fix.by(by.x, x) : object "SEARCH_KEY1" not found
> >
>
>
> I modified the last one by specifying the file name and got :
>
> >
> mergedData=merge(Susan,dataToMerge,by.x=Susan$SEARCH_KEY1,by.y=John$SEARCH_KEY2,all.x=T,all.y=F)
> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
> >
>
> Have you got any clue about this?
>
> Thanks much,
>        Allen
>
>
> On Wed, Oct 8, 2008 at 11:46 AM, Daniel Malter <[EMAIL PROTECTED]> wrote:
>
>> dataToMerge=data.frame(yourtablename2$PROBE_ID2,
>> yourtablename2$SEARCH_KEY2)
>> ##Puts the two columns of interest in dataset 2 in a separate data frame.
>>
>>
>> mergedData=merge(yourtablename1,dataToMerge,by.x=SEARCH_KEY1,by.y=SEARCH_KEY
>> 2,all.x=T,all.y=F)
>> ##merges the first table with the data frame just created looking for
>> matches between SEARCH_KEY1 and SEARCH_KEY2, all entries in dataset 1 are
>> retained (whether matched or not), entries in the dataToMerge dataframe
>> that
>> do not match any entries in dataset 1 are dropped.
>>
>> You then have an additional column in "mergedData" that contains the
>> PROBE_ID2 and you can just assign them to PROBE_ID1 (i.e. replace
>> PROBE_ID1
>> by the values in this column).
>>
>> Cheers,
>> Daniel
>>
>> -------------------------
>> cuncta stricte discussurus
>> -------------------------
>>
>> -----Ursprüngliche Nachricht-----
>> Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
>> Im
>> Auftrag von ss
>> Gesendet: Wednesday, October 08, 2008 11:33 AM
>> An: R help
>> Betreff: [R] How to join the two tables based on one overlapped column
>>
>> Dear list,
>>
>> I need some clues on this.  I have two excel files and I basically want to
>> map one to the other one. Can you give me some hints how to do it?
>>
>> The first excel file, named as "Susan_probe.xls", there are two columns,
>> "PROBE_ID1" and "SEARCH_KEY1"
>>
>>  PROBE_ID1 SEARCH_KEY1  ILMN_30212 ILMN_30212  ILMN_1285 ILMN_1285
>> ILMN_137964 ILMN_137964  ILMN_138109 ILMN_138109 ...
>>
>> The second excel file, named as "John_probe.xls", there are two columns as
>> well, "PROBE_ID2" and "SEARCH_KEY2".
>>
>>  PROBE_ID2 SEARCH_KEY2  ILMN_1809034 ILMN_16367  ILMN_1660305 ILMN_16583
>> ILMN_1792173 ILMN_19158 ...
>>
>> There are 46713 rows in the first excel file and 49702 rows in the second
>> file.
>>
>> Probes in the first columns of two excel files are different but they can
>> be
>> matched based on the second column "SEARCH_KEY". So what I want to do is
>> to
>> substitute the "PROBE_ID1" in the "Susan_probe.xls" file with the
>> "PROBE_ID2" in the "John_probe.xls" based on their common "SEARCH_KEY".
>>
>> Thank you so much for your help. I really appreciate.
>>
>> All the best,
>>      Allen
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to join the two tables based on one overlapped column

Reply via email to