Hi Srecko, Try this: dat1<- read.table(text=" id module event time time_on_task Categ url 1 sys login 1373502892 80 B http:// 2 task add 1373502892 80 A http://post/add?id=33&idp=67 3 task add 1373502972 23 A http://post/add?id=34&idp=67 4 sys login 1373502892 80 B http:// 5 list delete 1373502995 901 C http:// 6 list view 1373503896 100 D http:// 7 task add 1373503996 NA A http://post/add?id=35&idp=99 ",sep="",header=TRUE,stringsAsFactors=FALSE) vec1<-as.numeric(gsub(".*\\?.*=(\\d+)\\&.*","\\1",dat1$url[dat1$Categ=="A"]))
dat2<- read.table(text=" id idpost idtopic iduser 1 45 33 101 2 46 34 102 3 47 33 103 4 48 33 101 5 49 35 104 ",sep="",header=TRUE) student_list<- c(101:102,104:107) vec2<-with(dat2,tapply(iduser,list(idtopic),FUN=function(x) all(x%in% student_list))) dat1$Categ[dat1$Categ=="A"][match(vec1,as.numeric(names(vec2)))[!vec2]]<-"F" dat1 # id module event time time_on_task Categ url #1 1 sys login 1373502892 80 B http:// #2 2 task add 1373502892 80 F http://post/add?id=33&idp=67 #3 3 task add 1373502972 23 A http://post/add?id=34&idp=67 #4 4 sys login 1373502892 80 B http:// #5 5 list delete 1373502995 901 C http:// #6 6 list view 1373503896 100 D http:// #7 7 task add 1373503996 NA A http://post/add?id=35&idp=99 A.K. ________________________________ From: srecko joksimovic <sreckojoksimo...@gmail.com> To: arun <smartpink...@yahoo.com> Sent: Thursday, August 29, 2013 6:04 PM Subject: Re: [R] Add new calculated column to data frame "Did you mean to separate the number 33 from the link? ", yes that is correct. It should be something like this: # id module event time time_on_task Categ url #1 1 sys login 1373502892 80 B http:// #2 2 task add 1373502892 80 A http://post/add?id=33&idp=67 #3 3 task add 1373502972 23 A http://post/add?id=34&idp=67 #4 4 sys login 1373502892 80 B http:// #5 5 list delete 1373502995 901 C http:// #6 6 list view 1373503896 100 D http:// #7 7 task add 1373503996 NA A http://post/add?id=35&idp=99 from this table I should get 3 rows with 3 URLs: http://post/add?id=33&idp=67, http://post/add?id=34&idp=67, and http://post/add?id=35&idp=99 For each of them, I need to extract id (33, 34, and 35). Once I do that, I need to obtain users from this table: id idpost idtopic iduser 1 45 33 101 2 46 34 102 3 47 33 103 4 48 33 101 5 49 35 104 again, for each id. This means: id = 33 => 101, 103 id = 34 => 102 id = 35 => 104 Next, for each vector I need to check whether or not all it's values are in the students list (101,102, 104,105, 106,107) id = 33 => FALSE (since 103 is not in the list) id = 34 => TRUE id = 35 => TRUE This means that category for row 2 in the first table is not A any more, but F... Thanks, Srecko On Thu, Aug 29, 2013 at 2:56 PM, arun <smartpink...@yahoo.com> wrote: HI Srecko, >Did you mean to separate the number 33 from the link? Could you provide a >reproducible example with the output you expected? >Tx. > > >Arun > > > > > >________________________________ >From: srecko joksimovic <sreckojoksimo...@gmail.com> >To: arun <smartpink...@yahoo.com> >Sent: Thursday, August 29, 2013 5:38 PM > >Subject: Re: [R] Add new calculated column to data frame > > > >Hi Arun, > >I really appreciate your help, and we did a great job :) >but, now I think that R can do anything, so I'd like to try one more thing, if >you don't mind... > >from the table with categories, > ># id module event time time_on_task Categ url >#1 1 sys login 1373502892 80 B http: >#2 2 task add 1373502892 80 A http: >#3 3 task add 1373502972 23 A http: >#4 4 sys login 1373502892 80 B http: >#5 5 list delete 1373502995 901 C >#6 6 list view 1373503896 100 D >#7 7 task add 1373503996 NA A > > >I'd like to use only certain category (for example A). Each of these fields >has an url whose format is something like http://post/add?id=33&idp=45. First >step would be to extract this id (33 in this case). Based on that value, I >want to find all "iduser" from the following table: > >id idpost idtopic iduser >1 45 33 101 >2 46 34 102 > >3 47 33 103 > >4 48 33 101 > >5 49 35 104 > > >The next step would be to check if at least one of these values (iduser) is >not in the vectors "users" (only ids). If that is the case, I want to change >category to F, if not, I want to keep the same category. > >If this is too much for one question, I'll implement this in Java, but I'd >really like to try this with R. Maybe this id extraction from url is the most >important problem... I tried most of these steps, but still not able to put >them all together... > >Thank you so much for your time. >Srecko > > > > > > > > >On Thu, Aug 29, 2013 at 12:22 PM, arun <smartpink...@yahoo.com> wrote: > >Hi Srecko, >>No problem. >> >>Arun >> >> >> >> >> >> >>________________________________ >>From: srecko joksimovic <sreckojoksimo...@gmail.com> >>To: arun <smartpink...@yahoo.com> >>Sent: Thursday, August 29, 2013 3:19 PM >> >>Subject: Re: [R] Add new calculated column to data frame >> >> >> >>This is great Arun, thank you again. >> >>I was thinking to use sqldf and issue query for each module-action >>combination, but this is much better. Since I have table with categories >>(module, action, category), I could create vector "levels" based on the first >>two columns and vector "labels" based on the category column and that should >>to the work... >> >>Best, >>Srecko >> >> >> >>On Thu, Aug 29, 2013 at 12:16 PM, arun <smartpink...@yahoo.com> wrote: >> >>Hi Srecko, >>> >>>You didn't mention the order in which the letters are assigned. If you need >>>a different order, just change the order in the ",levels=c(....),". >>>Arun >>> >>> >>> >>> >>>----- Original Message ----- >>>From: arun <smartpink...@yahoo.com> >>>To: srecko joksimovic <sreckojoksimo...@gmail.com> >>>Cc: R help <r-help@r-project.org> >>> >>>Sent: Thursday, August 29, 2013 3:13 PM >>>Subject: Re: [R] Add new calculated column to data frame >>> >>> >>> >>>Hi, >>>You could try this: >>>dat1<- read.table(text=" >>>id module event time time_on_task >>>1 sys login 1373502892 80 >>>2 task add 1373502892 80 >>>3 task add 1373502972 23 >>>4 sys login 1373502892 80 >>>5 list delete 1373502995 901 >>>6 list view 1373503896 100 >>>7 task add 1373503996 NA >>>",sep="",header=TRUE,stringsAsFactors=FALSE) >>> >>>dat1$Categ<-as.character(factor(with(dat1,paste(module,event,sep="_")),levels=c("task_add","sys_login","list_delete","list_view"),labels=LETTERS[1:4])) >>> >>> >>>dat1 >>># id module event time time_on_task Categ >>>#1 1 sys login 1373502892 80 B >>>#2 2 task add 1373502892 80 A >>>#3 3 task add 1373502972 23 A >>>#4 4 sys login 1373502892 80 B >>>#5 5 list delete 1373502995 901 C >>>#6 6 list view 1373503896 100 D >>>#7 7 task add 1373503996 NA A >>>A.K. >>> >>>________________________________ >>>From: srecko joksimovic <sreckojoksimo...@gmail.com> >>>To: arun <smartpink...@yahoo.com> >>>Cc: R help <R-help@r-project.org> >>>Sent: Thursday, August 29, 2013 2:34 PM >>>Subject: Re: [R] Add new calculated column to data frame >>> >>> >>> >>>Hi Arun, >>> >>>There is one more question... you explained me how to use >>>split(dat1,cumsum(dat1$action=="login")) in one of previous questions, and >>>that is great. >>>Now, if I have something like this: >>> >>>id module event time time_on_task >>>1 sys login 1373502892 80 >>>2 task add 1373502892 80 >>> >>>3 task add 1373502972 23 >>>4 sys login 1373502892 80 >>>5 list delete 1373502995 901 >>>6 list view 1373503896 100 >>>7 task add 1373503996 NA >>>I know how to split at each "login" occurrence, and I know how to add new >>>column with time differences. But, how to add new column "category" which >>>will be calculated based on columns "module" and "even"? For example if >>>module=task and event=add => category= A... >>> >>>Srecko >>> >>> >>> >>> >>> >>>On Thu, Aug 29, 2013 at 11:22 AM, arun <smartpink...@yahoo.com> wrote: >>> >>>Hi Srecko, >>>>No problem. >>>>Regards, >>>>Arun >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>________________________________ >>>>From: srecko joksimovic <sreckojoksimo...@gmail.com> >>>>To: arun <smartpink...@yahoo.com> >>>>Sent: Thursday, August 29, 2013 2:22 PM >>>> >>>>Subject: Re: [R] Add new calculated column to data frame >>>> >>>> >>>> >>>>Sorry... I should figure it out... >>>> >>>>thanks so much! >>>>Srecko >>>> >>>> >>>> >>>>On Thu, Aug 29, 2013 at 11:21 AM, arun <smartpink...@yahoo.com> wrote: >>>> >>>>Hi, >>>>>The one you showed is: >>>>> >>>>>dat1$time_on_task<- c(diff(dat1$time),NA) >>>>> >>>>> dat1 >>>>># id event time time_on_task >>>>>#1 1 add 1373502892 80 >>>>> >>>>>#2 2 add 1373502972 23 >>>>>#3 3 delete 1373502995 901 >>>>>#4 4 view 1373503896 100 >>>>>#5 5 add 1373503996 NA >>>>> >>>>> >>>>> >>>>> >>>>>________________________________ >>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com> >>>>> >>>>>To: arun <smartpink...@yahoo.com> >>>>>Cc: R help <r-help@r-project.org> >>>>>Sent: Thursday, August 29, 2013 2:15 PM >>>>>Subject: Re: [R] Add new calculated column to data frame >>>>> >>>>> >>>>> >>>>> >>>>>Thanks Arun, >>>>> >>>>>this is great. However, it should be just a little bit different: >>>>> >>>>># id event time time_on_task >>>>>#1 1 add 1373502892 80 >>>>>#2 2 add 1373502972 23 >>>>>#3 3 delete 1373502995 901 >>>>>#4 4 view 1373503896 100 >>>>>#5 5 add 1373503996 NA >>>>> >>>>> >>>>>When I calculate difference, I need to know how long each activity was. It >>>>>is id2-id1 for the first activity... >>>>> >>>>> >>>>> >>>>>On Thu, Aug 29, 2013 at 11:03 AM, arun <smartpink...@yahoo.com> wrote: >>>>> >>>>> >>>>>> >>>>>>Hi, >>>>>>Try: >>>>>>dat1<- read.table(text=" >>>>>>id event time >>>>>> >>>>>>1 add 1373502892 >>>>>>2 add 1373502972 >>>>>>3 delete 1373502995 >>>>>>4 view 1373503896 >>>>>>5 add 1373503996 >>>>>>",sep="",header=TRUE,stringsAsFactors=FALSE) >>>>>> dat1$time_on_task<- c(NA,diff(dat1$time)) >>>>>> dat1 >>>>>># id event time time_on_task >>>>>>#1 1 add 1373502892 NA >>>>>>#2 2 add 1373502972 80 >>>>>>#3 3 delete 1373502995 23 >>>>>>#4 4 view 1373503896 901 >>>>>>#5 5 add 1373503996 100 >>>>>> >>>>>>#Not sure whether this depends on the values of "event" or not.. >>>>>>A.K. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>----- Original Message ----- >>>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com> >>>>>>To: R help <R-help@r-project.org> >>>>>>Cc: >>>>>>Sent: Thursday, August 29, 2013 1:52 PM >>>>>>Subject: [R] Add new calculated column to data frame >>>>>> >>>>>>Hi, >>>>>> >>>>>>I have a following data set: >>>>>>id event time (in sec) >>>>>>1 add 1373502892 >>>>>>2 add 1373502972 >>>>>>3 delete 1373502995 >>>>>>4 view 1373503896 >>>>>>5 add 1373503996 >>>>>>... >>>>>> >>>>>>I'd like to add new column "time on task" which is time elapsed between >>>>>>two >>>>>>events (id2 - id1...). What would be the best approach to do that? >>>>>> >>>>>>Thanks, >>>>>>Srecko >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>>______________________________________________ >>>>>>R-help@r-project.org mailing list >>>>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>PLEASE do read the posting guide >>>>>>http://www.R-project.org/posting-guide.html >>>>>>and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>>> >>>>> >>>> >>> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.