Dear all,, thank you all for your help..Its been such a help but its not really exactly what I am looking for. Apparently I havent explained the condition very clearly. I hope this can works.
If the data on column product is duplicated from the previous row, (its applied for response==buy and ==sample) , and it is duplicated from the row which has the value on column 'response'== buy, than the value = 1, otherwise is =0. so in that case, if the value is duplicated but it is duplicated from the previous row where the value of resonse==sample, than it is not considered duplicated, and in the new column is 0 thank you very much in advance, I really appreciated On Sat, Jul 27, 2013 at 3:45 AM, arun <smartpink...@yahoo.com> wrote: > > > On some slightly different datasets: > tt1<-structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, > 6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, > 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, > 1L, 2L, 1L), .Label = c("buy", "sample"), class = "factor"), > product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 4, 2, 2, 4, 5, > 5, 4, 3, 4, 5, 4, 2)), .Names = c("subj", "response", "product" > ), class = "data.frame", row.names = c(NA, 22L)) > > tt2<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, > 6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, > 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, > 1L, 2L, 2L), .Label = c("buy", "sample"), class = "factor"), > product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 4, 5, 1, 4, > 2, 3, 3, 2, 5, 3, 4)), .Names = c("subj", "response", "product" > ), class = "data.frame", row.names = c(NA, 22L)) > > tt3<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, > 6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, > 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, > 1L, 1L, 2L), .Label = c("buy", "sample"), class = "factor"), > product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 1, 3, 5, 2, > 2, 2, 2, 4, 3, 2, 5)), .Names = c("subj", "response", "product" > ), class = "data.frame", row.names = c(NA, 22L)) > > > #Tried David's solution: > tt1$rown <- rownames(tt1) > as.numeric ( apply(tt1, 1, function(x) { > x['product'] %in% tt1[ rownames(tt1) < x['rown'] & tt1$response == > "buy", "product"] } ) ) > #gave inconsistent results especially since the first 10 rows were from > `tt` > # [1] 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 1 > > #similarly for `tt2` and `tt3`. > > > ##Created this function. It seems to work in the tested cases, though it > is not tested extensively. > fun1<- function(dat,colName,newColumn){ > indx<- which(dat[,colName]=="buy") > dat[,newColumn]<-0 > dat[unlist(lapply(seq_along(indx),function(i){ > x1<- if(i==length(indx)){ > seq(indx[i],nrow(dat)) > } > else if((indx[i+1]-indx[i])==1){ > indx[i] > } > else { > seq(indx[i]+1,indx[i+1]-1) > } > x2<- dat[unique(c(indx[i:1],x1)),] > x3<- subset(x2,response=="sample") > x4<- subset(x2,response=="buy") > if(nrow(x3)!=0) { > row.names(x3)[x3$product%in% x4$product] > } > > })),newColumn]<-1 > dat > > } > fun1(tt,"response","newCol") > # subj response product rown newCol > #1 1 sample 1 1 0 > #2 1 sample 2 2 0 > #3 1 buy 3 3 0 > #4 2 sample 2 4 0 > #5 2 buy 2 5 0 > #6 3 sample 3 6 1 > #7 3 sample 2 7 1 > #8 3 buy 1 8 0 > #9 4 sample 1 9 1 > #10 4 buy 4 10 0 > > fun1(tt1,"response","newCol") > # subj response product newCol > #1 1 sample 1 0 > #2 1 sample 2 0 > #3 1 buy 3 0 > #4 2 sample 2 0 > #5 2 buy 2 0 > #6 3 sample 3 1 > #7 3 sample 2 1 > #8 3 buy 1 0 > #9 4 sample 1 1 > #10 4 buy 4 0 > #11 5 buy 4 0 > #12 5 sample 2 1 > #13 5 buy 2 0 > #14 6 buy 4 0 > #15 6 sample 5 0 > #16 6 sample 5 0 > #17 7 sample 4 1 > #18 7 buy 3 0 > #19 7 buy 4 0 > #20 8 buy 5 0 > #21 8 sample 4 1 > #22 8 buy 2 0 > #Also > fun1(tt2,"response","newCol") > fun1(tt3,"response","newCol") > A.K. > > P.S. Below is OP's clarification regarding the conditional statement in a > private message: > > I am sorry i didnt question it very clearly, let me change the > conditional statement, I hope you can understand. i will explain by > example > > as you can see, almost every number is duplicated, but only in row > 6th,7th,and 9th the value on column is 1. > > on row4th, the value is duplicated( 2 already occurred on 2nd row),but > since the value is considered as duplicated only if the value is > duplicated where the response is 'buy' than the value on column, on > row4th still zero. > > On row 6th, where the value product column is 3. 3 is already occurred > in 3rd row where the value on response is 'buy', so the value on column > should be 1 > > I hope it can understand the conditional statement. > > > > > > > > > ----- Original Message ----- > From: David Winsemius <dwinsem...@comcast.net> > To: David Winsemius <dwinsem...@comcast.net> > Cc: R-help@r-project.org; Uwe Ligges <lig...@statistik.tu-dortmund.de> > Sent: Friday, July 26, 2013 5:16 PM > Subject: Re: [R] Duplicated function with conditional statement > > > On Jul 26, 2013, at 2:06 PM, David Winsemius wrote: > > > > > On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote: > > > >> > >> > >> On 25.07.2013 21:05, vanessa van der vaart wrote: > >>> Hi everybody,, > >>> I have a question about R function duplicated(). I have spent days try > to > >>> figure this out,but I cant find any solution yet. I hope somebody can > help > >>> me.. > >>> this is my data: > >>> > >>> subj=c(1,1,1,2,2,3,3,3,4,4) > >>> response=c('sample','sample','buy','sample','buy','sample',' > >>> sample','buy','sample','buy') > >>> product=c(1,2,3,2,2,3,2,1,1,4) > >>> tt=data.frame(subj, response, product) > >>> > >>> the data look like this: > >>> > >>> subj response product > >>> 1 1 sample 1 > >>> 2 1 sample 2 > >>> 3 1 buy 3 > >>> 4 2 sample 2 > >>> 5 2 buy 2 > >>> 6 3 sample 3 > >>> 7 3 sample 2 > >>> 8 3 buy 1 > >>> 9 4 sample 1 > >>> 10 4 buy 4 > >>> > >>> I want to create new column based on the value on response and product > >>> column. if the value on product is duplicated, then the value on new > column > >>> is 1, otherwise is 0. > >> > >> > >> According to your description: > >> > > > > Agree that the description did not match the output. I tried to match > the output using a rule that could be expressed as: > > > > if( a "buy"- associated "product" value precedes the current "product" > value){1}else{0} > > > > So this delivers the specified output: > > tt$rown <- rownames(tt) > as.numeric ( apply(tt, 1, function(x) { > x['product'] %in% tt[ rownames(tt) < x['rown'] & tt$response == > "buy", "product"] } ) ) > > # [1] 0 0 0 0 0 1 1 0 1 0 > > > -- > > David. > > > >> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy") > >> > >> which is different from what you show us below, where I cannot derive > any systematic rule from. > >> > >> Uwe Ligges > >> > >>> but I want to add conditional statement that the value on product > column > >>> will only be considered as duplicated if the value on response column > is > >>> 'buy'. > >>> for illustration, the table should look like this: > >>> > >>> subj response product newcolumn > >>> 1 1 sample 1 0 > >>> 2 1 sample 2 0 > >>> 3 1 buy 3 0 > >>> 4 2 sample 2 0 > >>> 5 2 buy 2 0 > >>> 6 3 sample 3 1 > >>> 7 3 sample 2 1 > >>> 8 3 buy 1 0 > >>> 9 4 sample 1 1 > >>> 10 4 buy 4 0 > >>> > >>> > >>> can somebody help me? > >>> any help will be appreciated. > >>> I am new in this mailing list, so forgive me in advance, If I did not > ask > >>> the question appropriately. > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > David Winsemius > > Alameda, CA, USA > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.