On some slightly different datasets: tt1<-structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L), .Label = c("buy", "sample"), class = "factor"), product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 4, 2, 2, 4, 5, 5, 4, 3, 4, 5, 4, 2)), .Names = c("subj", "response", "product" ), class = "data.frame", row.names = c(NA, 22L))
tt2<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("buy", "sample"), class = "factor"), product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 4, 5, 1, 4, 2, 3, 3, 2, 5, 3, 4)), .Names = c("subj", "response", "product" ), class = "data.frame", row.names = c(NA, 22L)) tt3<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("buy", "sample"), class = "factor"), product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 1, 3, 5, 2, 2, 2, 2, 4, 3, 2, 5)), .Names = c("subj", "response", "product" ), class = "data.frame", row.names = c(NA, 22L)) #Tried David's solution: tt1$rown <- rownames(tt1) as.numeric ( apply(tt1, 1, function(x) { x['product'] %in% tt1[ rownames(tt1) < x['rown'] & tt1$response == "buy", "product"] } ) ) #gave inconsistent results especially since the first 10 rows were from `tt` # [1] 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 1 #similarly for `tt2` and `tt3`. ##Created this function. It seems to work in the tested cases, though it is not tested extensively. fun1<- function(dat,colName,newColumn){ indx<- which(dat[,colName]=="buy") dat[,newColumn]<-0 dat[unlist(lapply(seq_along(indx),function(i){ x1<- if(i==length(indx)){ seq(indx[i],nrow(dat)) } else if((indx[i+1]-indx[i])==1){ indx[i] } else { seq(indx[i]+1,indx[i+1]-1) } x2<- dat[unique(c(indx[i:1],x1)),] x3<- subset(x2,response=="sample") x4<- subset(x2,response=="buy") if(nrow(x3)!=0) { row.names(x3)[x3$product%in% x4$product] } })),newColumn]<-1 dat } fun1(tt,"response","newCol") # subj response product rown newCol #1 1 sample 1 1 0 #2 1 sample 2 2 0 #3 1 buy 3 3 0 #4 2 sample 2 4 0 #5 2 buy 2 5 0 #6 3 sample 3 6 1 #7 3 sample 2 7 1 #8 3 buy 1 8 0 #9 4 sample 1 9 1 #10 4 buy 4 10 0 fun1(tt1,"response","newCol") # subj response product newCol #1 1 sample 1 0 #2 1 sample 2 0 #3 1 buy 3 0 #4 2 sample 2 0 #5 2 buy 2 0 #6 3 sample 3 1 #7 3 sample 2 1 #8 3 buy 1 0 #9 4 sample 1 1 #10 4 buy 4 0 #11 5 buy 4 0 #12 5 sample 2 1 #13 5 buy 2 0 #14 6 buy 4 0 #15 6 sample 5 0 #16 6 sample 5 0 #17 7 sample 4 1 #18 7 buy 3 0 #19 7 buy 4 0 #20 8 buy 5 0 #21 8 sample 4 1 #22 8 buy 2 0 #Also fun1(tt2,"response","newCol") fun1(tt3,"response","newCol") A.K. P.S. Below is OP's clarification regarding the conditional statement in a private message: I am sorry i didnt question it very clearly, let me change the conditional statement, I hope you can understand. i will explain by example as you can see, almost every number is duplicated, but only in row 6th,7th,and 9th the value on column is 1. on row4th, the value is duplicated( 2 already occurred on 2nd row),but since the value is considered as duplicated only if the value is duplicated where the response is 'buy' than the value on column, on row4th still zero. On row 6th, where the value product column is 3. 3 is already occurred in 3rd row where the value on response is 'buy', so the value on column should be 1 I hope it can understand the conditional statement. ----- Original Message ----- From: David Winsemius <dwinsem...@comcast.net> To: David Winsemius <dwinsem...@comcast.net> Cc: R-help@r-project.org; Uwe Ligges <lig...@statistik.tu-dortmund.de> Sent: Friday, July 26, 2013 5:16 PM Subject: Re: [R] Duplicated function with conditional statement On Jul 26, 2013, at 2:06 PM, David Winsemius wrote: > > On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote: > >> >> >> On 25.07.2013 21:05, vanessa van der vaart wrote: >>> Hi everybody,, >>> I have a question about R function duplicated(). I have spent days try to >>> figure this out,but I cant find any solution yet. I hope somebody can help >>> me.. >>> this is my data: >>> >>> subj=c(1,1,1,2,2,3,3,3,4,4) >>> response=c('sample','sample','buy','sample','buy','sample',' >>> sample','buy','sample','buy') >>> product=c(1,2,3,2,2,3,2,1,1,4) >>> tt=data.frame(subj, response, product) >>> >>> the data look like this: >>> >>> subj response product >>> 1 1 sample 1 >>> 2 1 sample 2 >>> 3 1 buy 3 >>> 4 2 sample 2 >>> 5 2 buy 2 >>> 6 3 sample 3 >>> 7 3 sample 2 >>> 8 3 buy 1 >>> 9 4 sample 1 >>> 10 4 buy 4 >>> >>> I want to create new column based on the value on response and product >>> column. if the value on product is duplicated, then the value on new column >>> is 1, otherwise is 0. >> >> >> According to your description: >> > > Agree that the description did not match the output. I tried to match the > output using a rule that could be expressed as: > > if( a "buy"- associated "product" value precedes the current "product" > value){1}else{0} > So this delivers the specified output: tt$rown <- rownames(tt) as.numeric ( apply(tt, 1, function(x) { x['product'] %in% tt[ rownames(tt) < x['rown'] & tt$response == "buy", "product"] } ) ) # [1] 0 0 0 0 0 1 1 0 1 0 > -- > David. > >> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy") >> >> which is different from what you show us below, where I cannot derive any >> systematic rule from. >> >> Uwe Ligges >> >>> but I want to add conditional statement that the value on product column >>> will only be considered as duplicated if the value on response column is >>> 'buy'. >>> for illustration, the table should look like this: >>> >>> subj response product newcolumn >>> 1 1 sample 1 0 >>> 2 1 sample 2 0 >>> 3 1 buy 3 0 >>> 4 2 sample 2 0 >>> 5 2 buy 2 0 >>> 6 3 sample 3 1 >>> 7 3 sample 2 1 >>> 8 3 buy 1 0 >>> 9 4 sample 1 1 >>> 10 4 buy 4 0 >>> >>> >>> can somebody help me? >>> any help will be appreciated. >>> I am new in this mailing list, so forgive me in advance, If I did not ask >>> the question appropriately. >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.