Re: [R] Creating binary variable depending on strings of two dataframes

2011-05-10 Thread noxyp...@gmail.com
On Fri, May 6, 2011 at 7:41 PM, David Winsemius  wrote:
>
> On May 6, 2011, at 11:35 AM, Pete Pete wrote:
>
>>
>> Gabor Grothendieck wrote:
>>>
>>> On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete <noxyp...@gmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>> consider the following two dataframes:
>>>> x1=c("232","3454","3455","342","13")
>>>> x2=c("1","1","1","0","0")
>>>> data1=data.frame(x1,x2)
>>>>
>>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>>>> y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2")
>>>> data2=data.frame(y1,y2)
>>>>
>>>> I need a new column in dataframe data1 (x3), which is either 0 or 1
>>>> depending if the value "E1" in y2 of data2 is true while x1=y1. The
>>>> result
>>>> of data1 should look like this:
>>>>  x1     x2 x3
>>>> 1 232   1   1
>>>> 2 3454 1   1
>>>> 3 3455 1   0
>>>> 4 342   0   0
>>>> 5 13     0   1
>>>>
>>>> I think a SQL command could help me but I am too inexperienced with it
>>>> to
>>>> get there.
>>>>
>>>
>>> Try this:
>>>
>>>> library(sqldf)
>>>> sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join data2 d2
>>>> on (x1 = y1) group by x1, x2 order by d1.rowid")
>>>
>>>   x1 x2 x3
>>> 1  232  1  1
>>> 2 3454  1  1
>>> 3 3455  1  0
>>> 4  342  0  0
>>> 5   13  0  1
>>>
>>>
> snipped Gabor's sig
>>
>> That works pretty cool but I need to automate this a bit more. Consider
>> the
>> following example:
>>
>> list1=c("A01","B04","A64","G84","F19")
>>
>> x1=c("232","3454","3455","342","13")
>> x2=c("1","1","1","0","0")
>> data1=data.frame(x1,x2)
>>
>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>> y2=c("E13","B04","F19","A64","E22","H44","F68","G84","F19","A01")
>> data2=data.frame(y1,y2)
>>
>> I want now to creat a loop, which creates for every value in list1 a new
>> binary variable in data1. Result should look like:
>> x1      x2      A01     B04     A64     G84     F19
>> 232     1       0       1       0       0       0
>> 3454    1       0       0       1       0       1
>> 3455    1       0       0       0       0       0
>> 342     0       0       0       0       0       0
>> 13      0       1       0       0       1       1
>
> Loops!?! We don't nee no steenking loops!
>
>> xtb <-  with(data2, table(y1,y2))
>> cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] )
>       x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44
> 232   232  1   0   0   1   1   0   0   0   0   0
> 3454 3454  1   0   1   0   0   0   1   0   0   0
> 3455 3455  1   0   0   0   0   1   0   0   0   0
> 342   342  0   0   0   0   0   0   0   0   0   1
> 13     13  0   1   0   0   0   0   1   1   1   0
>
> I am guessing that you were to ... er, busy? ... to complete the table?
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>

Thanks a lot! Pretty simple. I am so much used to SQLDF right now.

So how would you handle more complicated strings like that:
y1=c("232","232", "232", "3454","3454","3455","342","13","13","13","13")
y2=c("E13","B04 A01 F19","B04","F19","A64 G84 A05","E22","H44
C35","F68","G84","F19","A01")
data2=data.frame(y1,y2)

Where you want to extract for instance all "A01" from the strings?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating binary variable depending on strings of two dataframes

2011-05-10 Thread noxyp...@gmail.com
On Tue, May 10, 2011 at 3:09 PM, David Winsemius 
wrote:
>
> On May 10, 2011, at 3:18 AM, noxyp...@gmail.com wrote:
>
>> On Fri, May 6, 2011 at 7:41 PM, David Winsemius 
>> wrote:
>>>
>>> On May 6, 2011, at 11:35 AM, Pete Pete wrote:
>>>
>>>>
>>>> Gabor Grothendieck wrote:
>>>>>
>>>>> On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete <noxyp...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>> consider the following two dataframes:
>>>>>> x1=c("232","3454","3455","342","13")
>>>>>> x2=c("1","1","1","0","0")
>>>>>> data1=data.frame(x1,x2)
>>>>>>
>>>>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>>>>>> y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2")
>>>>>> data2=data.frame(y1,y2)
>>>>>>
>>>>>> I need a new column in dataframe data1 (x3), which is either 0 or 1
>>>>>> depending if the value "E1" in y2 of data2 is true while x1=y1. The
>>>>>> result
>>>>>> of data1 should look like this:
>>>>>>  x1 x2 x3
>>>>>> 1 232   1   1
>>>>>> 2 3454 1   1
>>>>>> 3 3455 1   0
>>>>>> 4 342   0   0
>>>>>> 5 13 0   1
>>>>>>
>>>>>> I think a SQL command could help me but I am too inexperienced with
it
>>>>>> to
>>>>>> get there.
>>>>>>
>>>>>
>>>>> Try this:
>>>>>
>>>>>> library(sqldf)
>>>>>> sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join data2
>>>>>> d2
>>>>>> on (x1 = y1) group by x1, x2 order by d1.rowid")
>>>>>
>>>>>  x1 x2 x3
>>>>> 1  232  1  1
>>>>> 2 3454  1  1
>>>>> 3 3455  1  0
>>>>> 4  342  0  0
>>>>> 5   13  0  1
>>>>>
>>>>>
>>> snipped Gabor's sig
>>>>
>>>> That works pretty cool but I need to automate this a bit more. Consider
>>>> the
>>>> following example:
>>>>
>>>> list1=c("A01","B04","A64","G84","F19")
>>>>
>>>> x1=c("232","3454","3455","342","13")
>>>> x2=c("1","1","1","0","0")
>>>> data1=data.frame(x1,x2)
>>>>
>>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>>>> y2=c("E13","B04","F19","A64","E22","H44","F68","G84","F19","A01")
>>>> data2=data.frame(y1,y2)
>>>>
>>>> I want now to creat a loop, which creates for every value in list1 a
new
>>>> binary variable in data1. Result should look like:
>>>> x1  x2  A01 B04 A64 G84 F19
>>>> 232 1   0   1   0   0   0
>>>> 34541   0   0   1   0   1
>>>> 34551   0   0   0   0   0
>>>> 342 0   0   0   0   0   0
>>>> 13  0   1   0   0   1   1
>>>
>>> Loops!?! We don't nee no steenking loops!
>>>
>>>> xtb <-  with(data2, table(y1,y2))
>>>> cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] )
>>>
>>>  x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44
>>> 232   232  1   0   0   1   1   0   0   0   0   0
>>> 3454 3454  1   0   1   0   0   0   1   0   0   0
>>> 3455 3455  1   0   0   0   0   1   0   0   0   0
>>> 342   342  0   0   0   0   0   0   0   0   0   1
>>> 13 13  0   1   0   0   0   0   1   1   1   0
>>>
>>> I am guessing that you were to ... er, busy? ... to complete the table?
>>>
>>> --
>>>
>>> David Winse

Re: [R] Creating binary variable depending on strings of two dataframes

2011-05-10 Thread noxyp...@gmail.com
On Tue, May 10, 2011 at 3:09 PM, David Winsemius wrote:

>
> On May 10, 2011, at 3:18 AM, noxyp...@gmail.com wrote:
>
>  On Fri, May 6, 2011 at 7:41 PM, David Winsemius 
>> wrote:
>>
>>>
>>> On May 6, 2011, at 11:35 AM, Pete Pete wrote:
>>>
>>>
>>>> Gabor Grothendieck wrote:
>>>>
>>>>>
>>>>> On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete <noxyp...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>> consider the following two dataframes:
>>>>>> x1=c("232","3454","3455","342","13")
>>>>>> x2=c("1","1","1","0","0")
>>>>>> data1=data.frame(x1,x2)
>>>>>>
>>>>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>>>>>> y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2")
>>>>>> data2=data.frame(y1,y2)
>>>>>>
>>>>>> I need a new column in dataframe data1 (x3), which is either 0 or 1
>>>>>> depending if the value "E1" in y2 of data2 is true while x1=y1. The
>>>>>> result
>>>>>> of data1 should look like this:
>>>>>>  x1 x2 x3
>>>>>> 1 232   1   1
>>>>>> 2 3454 1   1
>>>>>> 3 3455 1   0
>>>>>> 4 342   0   0
>>>>>> 5 13 0   1
>>>>>>
>>>>>> I think a SQL command could help me but I am too inexperienced with it
>>>>>> to
>>>>>> get there.
>>>>>>
>>>>>>
>>>>> Try this:
>>>>>
>>>>>  library(sqldf)
>>>>>> sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join data2
>>>>>> d2
>>>>>> on (x1 = y1) group by x1, x2 order by d1.rowid")
>>>>>>
>>>>>
>>>>>  x1 x2 x3
>>>>> 1  232  1  1
>>>>> 2 3454  1  1
>>>>> 3 3455  1  0
>>>>> 4  342  0  0
>>>>> 5   13  0  1
>>>>>
>>>>>
>>>>>  snipped Gabor's sig
>>>
>>>>
>>>> That works pretty cool but I need to automate this a bit more. Consider
>>>> the
>>>> following example:
>>>>
>>>> list1=c("A01","B04","A64","G84","F19")
>>>>
>>>> x1=c("232","3454","3455","342","13")
>>>> x2=c("1","1","1","0","0")
>>>> data1=data.frame(x1,x2)
>>>>
>>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>>>> y2=c("E13","B04","F19","A64","E22","H44","F68","G84","F19","A01")
>>>> data2=data.frame(y1,y2)
>>>>
>>>> I want now to creat a loop, which creates for every value in list1 a new
>>>> binary variable in data1. Result should look like:
>>>> x1  x2  A01 B04 A64 G84 F19
>>>> 232 1   0   1   0   0   0
>>>> 34541   0   0   1   0   1
>>>> 34551   0   0   0   0   0
>>>> 342 0   0   0   0   0   0
>>>> 13  0   1   0   0   1   1
>>>>
>>>
>>> Loops!?! We don't nee no steenking loops!
>>>
>>>  xtb <-  with(data2, table(y1,y2))
>>>> cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] )
>>>>
>>>  x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44
>>> 232   232  1   0   0   1   1   0   0   0   0   0
>>> 3454 3454  1   0   1   0   0   0   1   0   0   0
>>> 3455 3455  1   0   0   0   0   1   0   0   0   0
>>> 342   342  0   0   0   0   0   0   0   0   0   1
>>> 13 13  0   1   0   0   0   0   1   1   1   0
>>>
>>> I am guessing that you were to ... er, busy? ... to complete the table?
>>>
>>> --
>>>
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>>
>>>
>> Thanks a lot! Pretty simple. I am so much used to SQLDF right now.
>>
>> So how would you handle more complicated strings like that:
>> y1=c("232","232", "232", "3454","3454","3455","342","13","13","13","13")
>> y2=c("E13","B04 A01 F19","B04","F19","A64 G84 A05","E22","H44
>> C35","F68","G84","F19","A01")
>> data2=data.frame(y1,y2)
>>
>> Where you want to extract for instance all "A01" from the strings?
>>
>
> I think you need either to explain what you want in more words of the
> English language or to offer an example of the desired output. I suspect you
> did not want something as simple as this:
>
> > A01.instances <- grep("A01" , data2$y2)
> > A01.instances
> [1]  2 11
> > data2[A01.instances, ]
>y1  y2
> 2  232 B04 A01 F19
> 11  13 A01
>
> Or maybe you did?
>
> --
> David Winsemius, MD
> West Hartford, CT
>
>
With sqldf I could do it manually:

> > data1=sqldf("SELECT data1.*, max(data2.y2 LIKE '% A01%') OR max(data2.y2
> LIKE 'A01%') A01 FROM data1 left join data2 on (data1.x1 = data2.y1) group
> by data1.x1, data2.y1")
> > data1=sqldf("SELECT data1.*, max(data2.y2 LIKE '% B04%') OR max(data2.y2
> LIKE 'B04%') B04 FROM data1 left join data2 on (data1.x1 = data2.y1) group
> by data1.x1, data2.y1")
> > data1
> x1 x2 A01 B04
> 1   13  0   1   0
> 2  232  1   1   1
> 3  342  0   0   0
> 4 3454  1   0   0
> 5 3455  1   0   0
> >
>

But I need to automate this for some thousand "substrings".
Any suggestion?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.