Re: [R] Duplicated function with conditional statement

vanessa van der vaart Sat, 27 Jul 2013 17:29:14 -0700

Dear all,,
thank you all for your help..Its been such a help but its not really
exactly what I am looking for. Apparently I havent explained the condition
very clearly. I hope this can works.


If the data on column product is duplicated from the previous row, (its
applied for response==buy and ==sample) , and it is duplicated from the row
which has the value on column 'response'== buy, than  the value = 1,
otherwise is =0.
so in that case,
if the value is duplicated but it is duplicated from the previous row where
the value of resonse==sample, than it is not considered duplicated, and in
the new column is 0

thank you very much in advance,
I really appreciated


On Sat, Jul 27, 2013 at 3:45 AM, arun <smartpink...@yahoo.com> wrote:

>
>
> On some slightly different datasets:
> tt1<-structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
> 6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
> 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
> 1L, 2L, 1L), .Label = c("buy", "sample"), class = "factor"),
>     product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 4, 2, 2, 4, 5,
>     5, 4, 3, 4, 5, 4, 2)), .Names = c("subj", "response", "product"
> ), class = "data.frame", row.names = c(NA, 22L))
>
> tt2<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
> 6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
> 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
> 1L, 2L, 2L), .Label = c("buy", "sample"), class = "factor"),
>     product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 4, 5, 1, 4,
>     2, 3, 3, 2, 5, 3, 4)), .Names = c("subj", "response", "product"
> ), class = "data.frame", row.names = c(NA, 22L))
>
> tt3<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
> 6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
> 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L,
> 1L, 1L, 2L), .Label = c("buy", "sample"), class = "factor"),
>     product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 1, 3, 5, 2,
>     2, 2, 2, 4, 3, 2, 5)), .Names = c("subj", "response", "product"
> ), class = "data.frame", row.names = c(NA, 22L))
>
>
> #Tried David's solution:
> tt1$rown <- rownames(tt1)
> as.numeric ( apply(tt1, 1, function(x) {
>     x['product'] %in% tt1[ rownames(tt1) < x['rown'] & tt1$response ==
> "buy", "product"]  } ) )
>   #gave inconsistent results especially since the first 10 rows were from
> `tt`
> # [1] 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 1
>
> #similarly for `tt2` and `tt3`.
>
>
> ##Created this function.  It seems to work in the tested cases, though it
> is not tested extensively.
> fun1<- function(dat,colName,newColumn){
>       indx<- which(dat[,colName]=="buy")
>       dat[,newColumn]<-0
>       dat[unlist(lapply(seq_along(indx),function(i){
>             x1<- if(i==length(indx)){
>                 seq(indx[i],nrow(dat))
>              }
>             else if((indx[i+1]-indx[i])==1){
>             indx[i]
>             }
>             else {
>             seq(indx[i]+1,indx[i+1]-1)
>              }
>             x2<- dat[unique(c(indx[i:1],x1)),]
>             x3<- subset(x2,response=="sample")
>             x4<- subset(x2,response=="buy")
>             if(nrow(x3)!=0) {
>                             row.names(x3)[x3$product%in% x4$product]
>                        }
>
>             })),newColumn]<-1
>     dat
>
>     }
> fun1(tt,"response","newCol")
> #   subj response product rown newCol
> #1     1   sample       1    1      0
> #2     1   sample       2    2      0
> #3     1      buy       3    3      0
> #4     2   sample       2    4      0
> #5     2      buy       2    5      0
> #6     3   sample       3    6      1
> #7     3   sample       2    7      1
> #8     3      buy       1    8      0
> #9     4   sample       1    9      1
> #10    4      buy       4   10      0
>
> fun1(tt1,"response","newCol")
> #   subj response product newCol
> #1     1   sample       1      0
> #2     1   sample       2      0
> #3     1      buy       3      0
> #4     2   sample       2      0
> #5     2      buy       2      0
> #6     3   sample       3      1
> #7     3   sample       2      1
> #8     3      buy       1      0
> #9     4   sample       1      1
> #10    4      buy       4      0
> #11    5      buy       4      0
> #12    5   sample       2      1
> #13    5      buy       2      0
> #14    6      buy       4      0
> #15    6   sample       5      0
> #16    6   sample       5      0
> #17    7   sample       4      1
> #18    7      buy       3      0
> #19    7      buy       4      0
> #20    8      buy       5      0
> #21    8   sample       4      1
> #22    8      buy       2      0
> #Also
>  fun1(tt2,"response","newCol")
>  fun1(tt3,"response","newCol")
> A.K.
>
> P.S.  Below is OP's clarification regarding the conditional statement in a
> private message:
>
> I am sorry i didnt question it very clearly, let me change the
> conditional statement, I hope you can understand. i will explain by
> example
>
> as you can see, almost every number is duplicated, but only in row
> 6th,7th,and 9th the value on column is 1.
>
> on row4th, the value is duplicated( 2 already occurred on 2nd row),but
> since the value is considered as duplicated only if the value is
> duplicated where the response is 'buy' than the value on column, on
> row4th still zero.
>
> On row 6th, where the value product column is 3. 3 is already occurred
> in 3rd row where the value on response is 'buy', so the value on column
> should be 1
>
> I hope it can understand the conditional statement.
>
>
>
>
>
>
>
>
> ----- Original Message -----
> From: David Winsemius <dwinsem...@comcast.net>
> To: David Winsemius <dwinsem...@comcast.net>
> Cc: R-help@r-project.org; Uwe Ligges <lig...@statistik.tu-dortmund.de>
> Sent: Friday, July 26, 2013 5:16 PM
> Subject: Re: [R] Duplicated function with conditional statement
>
>
> On Jul 26, 2013, at 2:06 PM, David Winsemius wrote:
>
> >
> > On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:
> >
> >>
> >>
> >> On 25.07.2013 21:05, vanessa van der vaart wrote:
> >>> Hi everybody,,
> >>> I have a question about R function duplicated(). I have spent days try
> to
> >>> figure this out,but I cant find any solution yet. I hope somebody can
> help
> >>> me..
> >>> this is my data:
> >>>
> >>> subj=c(1,1,1,2,2,3,3,3,4,4)
> >>> response=c('sample','sample','buy','sample','buy','sample','
> >>> sample','buy','sample','buy')
> >>> product=c(1,2,3,2,2,3,2,1,1,4)
> >>> tt=data.frame(subj, response, product)
> >>>
> >>> the data look like this:
> >>>
> >>> subj response product
> >>> 1     1   sample       1
> >>> 2     1   sample       2
> >>> 3     1      buy          3
> >>> 4     2   sample       2
> >>> 5     2         buy       2
> >>> 6     3   sample       3
> >>> 7     3   sample       2
> >>> 8     3         buy       1
> >>> 9     4  sample       1
> >>> 10   4       buy        4
> >>>
> >>> I want to create new  column based on the value on response and product
> >>> column. if the value on product is duplicated, then  the value on new
> column
> >>> is 1, otherwise is 0.
> >>
> >>
> >> According to your description:
> >>
> >
> > Agree that the description did not match the output. I tried to match
> the output using a rule that could be expressed as:
> >
> > if( a "buy"- associated "product" value precedes the current "product"
> value){1}else{0}
> >
>
> So this delivers the specified output:
>
> tt$rown <- rownames(tt)
> as.numeric ( apply(tt, 1, function(x) {
>      x['product'] %in% tt[ rownames(tt) < x['rown'] & tt$response ==
> "buy", "product"]  } ) )
>
> # [1] 0 0 0 0 0 1 1 0 1 0
>
> > --
> > David.
> >
> >> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy")
> >>
> >> which is different from what you show us below, where I cannot derive
> any systematic rule from.
> >>
> >> Uwe Ligges
> >>
> >>> but I want to add conditional statement that the value on product
> column
> >>> will only be considered as duplicated if the value on response column
> is
> >>> 'buy'.
> >>> for illustration, the table should look like this:
> >>>
> >>> subj response product newcolumn
> >>> 1     1   sample       1          0
> >>> 2     1   sample       2          0
> >>> 3     1      buy          3          0
> >>> 4     2   sample       2          0
> >>> 5     2         buy       2          0
> >>> 6     3   sample       3          1
> >>> 7     3   sample       2           1
> >>> 8     3         buy       1           0
> >>> 9     4  sample       1            1
> >>> 10   4       buy       4             0
> >>>
> >>>
> >>> can somebody help me?
> >>> any help will be appreciated.
> >>> I am new in this mailing list, so forgive me in advance, If I did not
> ask
> >>> the question appropriately.
> >>>
> >>>     [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Duplicated function with conditional statement

Reply via email to