Hi Srecko,
Try this:
dat1<- read.table(text="
id module  event       time time_on_task Categ    url
1    sys  login 1373502892           80     B         http://
2   task    add 1373502892           80     A         
http://post/add?id=33&idp=67
3   task    add 1373502972           23     A         
http://post/add?id=34&idp=67
4    sys  login 1373502892           80     B          http://
5   list delete 1373502995          901     C          http://
6   list   view 1373503896          100     D           http://
7   task    add 1373503996           NA     A        
http://post/add?id=35&idp=99
",sep="",header=TRUE,stringsAsFactors=FALSE)
vec1<-as.numeric(gsub(".*\\?.*=(\\d+)\\&.*","\\1",dat1$url[dat1$Categ=="A"]))

dat2<- read.table(text="
id idpost idtopic iduser
1   45      33       101
2   46      34       102
3   47      33       103
4   48      33       101
5   49      35       104
",sep="",header=TRUE)
 student_list<- c(101:102,104:107)
 vec2<-with(dat2,tapply(iduser,list(idtopic),FUN=function(x) all(x%in% 
student_list)))
dat1$Categ[dat1$Categ=="A"][match(vec1,as.numeric(names(vec2)))[!vec2]]<-"F"
 dat1
#  id module  event       time time_on_task Categ                          url
#1  1    sys  login 1373502892           80     B                      http://
#2  2   task    add 1373502892           80     F http://post/add?id=33&idp=67
#3  3   task    add 1373502972           23     A http://post/add?id=34&idp=67
#4  4    sys  login 1373502892           80     B                      http://
#5  5   list delete 1373502995          901     C                      http://
#6  6   list   view 1373503896          100     D                      http://
#7  7   task    add 1373503996           NA     A http://post/add?id=35&idp=99

A.K.

________________________________
From: srecko joksimovic <sreckojoksimo...@gmail.com>
To: arun <smartpink...@yahoo.com> 
Sent: Thursday, August 29, 2013 6:04 PM
Subject: Re: [R] Add new calculated column to data frame



"Did you mean to separate the number 33 from the link? ", yes that is correct. 
It should be something like this:


#  id module  event       time time_on_task Categ    url
#1  1    sys  login 1373502892           80     B         http://
#2  2   task    add 1373502892           80     A         
http://post/add?id=33&idp=67
#3  3   task    add 1373502972           23     A         
http://post/add?id=34&idp=67
#4  4    sys  login 1373502892           80     B          http://

#5  5   list delete 1373502995          901     C          http://
#6  6   list   view 1373503896          100     D           http://
#7  7   task    add 1373503996           NA     A        
http://post/add?id=35&idp=99

from this table I should get 3 rows with 3 URLs: http://post/add?id=33&idp=67, 
http://post/add?id=34&idp=67, and http://post/add?id=35&idp=99
For each of them, I need to extract id (33, 34, and 35). Once I do that, I need 
to obtain users from this table:
id idpost idtopic iduser
1   45      33       101
2   46      34       102

3   47      33       103

4   48      33       101

5   49      35       104

again, for each id. This means: 
id = 33 => 101, 103
id = 34 => 102

id = 35 => 104


Next, for each vector I need to check whether or not all it's values are in the 
students list (101,102, 104,105, 106,107)

id = 33 => FALSE (since 103 is not in the list)
id = 34 => TRUE

id = 35 => TRUE


This means that category for row 2 in the first table is not A any more, but 
F...

Thanks,
Srecko





On Thu, Aug 29, 2013 at 2:56 PM, arun <smartpink...@yahoo.com> wrote:

HI Srecko,
>Did you mean to separate the number 33 from the link? Could you provide a 
>reproducible example with the output you expected?
>Tx.
>
>
>Arun
>
>
>
>
>
>________________________________
>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>To: arun <smartpink...@yahoo.com>
>Sent: Thursday, August 29, 2013 5:38 PM
>
>Subject: Re: [R] Add new calculated column to data frame
>
>
>
>Hi Arun,
>
>I really appreciate your help, and we did a great job :)
>but, now I think that R can do anything, so I'd like to try one more thing, if 
>you don't mind...
>
>from the table with categories, 
>
>#  id module  event       time time_on_task Categ    url
>#1  1    sys  login 1373502892           80     B         http:
>#2  2   task    add 1373502892           80     A         http:
>#3  3   task    add 1373502972           23     A         http:
>#4  4    sys  login 1373502892           80     B          http:
>#5  5   list delete 1373502995          901     C
>#6  6   list   view 1373503896          100     D
>#7  7   task    add 1373503996           NA     A
>
>
>I'd like to use only certain category (for example A). Each of these fields 
>has an url whose format is something like http://post/add?id=33&idp=45. First 
>step would be to extract this id (33 in this case). Based on that value, I 
>want to find all "iduser" from the following table:
>
>id idpost idtopic iduser
>1   45      33       101
>2   46      34       102
>
>3   47      33       103
>
>4   48      33       101
>
>5   49      35       104
>
>
>The next step would be to check if at least one of these values (iduser) is 
>not in the vectors "users" (only ids). If that is the case, I want to change 
>category to F, if not, I want to keep the same category.
>
>If this is too much for one question, I'll implement this in Java, but I'd 
>really like to try this with R. Maybe this id extraction from url is the most 
>important problem... I tried most of these steps, but still not able to put 
>them all together...
>
>Thank you so much for your time.
>Srecko
>
>
>
>
>
>
>
>
>On Thu, Aug 29, 2013 at 12:22 PM, arun <smartpink...@yahoo.com> wrote:
>
>Hi Srecko,
>>No problem.
>>
>>Arun
>>
>>
>>
>>
>>
>>
>>________________________________
>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>>To: arun <smartpink...@yahoo.com>
>>Sent: Thursday, August 29, 2013 3:19 PM
>>
>>Subject: Re: [R] Add new calculated column to data frame
>>
>>
>>
>>This is great Arun, thank you again.
>>
>>I was thinking to use sqldf and issue query for each module-action 
>>combination, but this is much better. Since I have table with categories 
>>(module, action, category), I could create vector "levels" based on the first 
>>two columns and vector "labels" based on the category column and that should 
>>to the work...
>>
>>Best,
>>Srecko
>>
>>
>>
>>On Thu, Aug 29, 2013 at 12:16 PM, arun <smartpink...@yahoo.com> wrote:
>>
>>Hi Srecko,
>>>
>>>You didn't mention the order in which the letters are assigned.  If you need 
>>>a different order, just change the order in the ",levels=c(....),".
>>>Arun
>>>
>>>
>>>
>>>
>>>----- Original Message -----
>>>From: arun <smartpink...@yahoo.com>
>>>To: srecko joksimovic <sreckojoksimo...@gmail.com>
>>>Cc: R help <r-help@r-project.org>
>>>
>>>Sent: Thursday, August 29, 2013 3:13 PM
>>>Subject: Re: [R] Add new calculated column to data frame
>>>
>>>
>>>
>>>Hi,
>>>You could try this:
>>>dat1<- read.table(text="
>>>id  module    event       time                       time_on_task
>>>1   sys         login         1373502892           80
>>>2   task        add          1373502892           80
>>>3   task        add          1373502972           23
>>>4   sys         login         1373502892           80
>>>5   list         delete       1373502995          901
>>>6   list          view         1373503896          100
>>>7   task        add          1373503996           NA
>>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>> 
>>>dat1$Categ<-as.character(factor(with(dat1,paste(module,event,sep="_")),levels=c("task_add","sys_login","list_delete","list_view"),labels=LETTERS[1:4]))
>>>
>>>
>>>dat1
>>>#  id module  event       time time_on_task Categ
>>>#1  1    sys  login 1373502892           80     B
>>>#2  2   task    add 1373502892           80     A
>>>#3  3   task    add 1373502972           23     A
>>>#4  4    sys  login 1373502892           80     B
>>>#5  5   list delete 1373502995          901     C
>>>#6  6   list   view 1373503896          100     D
>>>#7  7   task    add 1373503996           NA     A
>>>A.K.
>>>
>>>________________________________
>>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>>>To: arun <smartpink...@yahoo.com>
>>>Cc: R help <R-help@r-project.org>
>>>Sent: Thursday, August 29, 2013 2:34 PM
>>>Subject: Re: [R] Add new calculated column to data frame
>>>
>>>
>>>
>>>Hi Arun,
>>>
>>>There is one more question... you explained me how to use 
>>>split(dat1,cumsum(dat1$action=="login")) in one of previous questions, and 
>>>that is great.
>>>Now, if I have something like this:
>>>
>>>id  module    event       time                       time_on_task
>>>1   sys         login         1373502892           80
>>>2   task        add          1373502892           80
>>>
>>>3   task        add          1373502972           23
>>>4   sys         login         1373502892           80
>>>5   list         delete       1373502995          901
>>>6   list          view         1373503896          100
>>>7   task        add          1373503996           NA
>>>I know how to split at each "login" occurrence, and I know how to add new 
>>>column with time differences. But, how to add new column "category" which 
>>>will be calculated based on columns "module" and "even"? For example if 
>>>module=task and event=add => category= A...
>>>
>>>Srecko
>>>
>>>
>>>
>>>
>>>
>>>On Thu, Aug 29, 2013 at 11:22 AM, arun <smartpink...@yahoo.com> wrote:
>>>
>>>Hi Srecko,
>>>>No problem.
>>>>Regards,
>>>>Arun
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>________________________________
>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>>>>To: arun <smartpink...@yahoo.com>
>>>>Sent: Thursday, August 29, 2013 2:22 PM
>>>>
>>>>Subject: Re: [R] Add new calculated column to data frame
>>>>
>>>>
>>>>
>>>>Sorry... I should figure it out...
>>>>
>>>>thanks so much!
>>>>Srecko
>>>>
>>>>
>>>>
>>>>On Thu, Aug 29, 2013 at 11:21 AM, arun <smartpink...@yahoo.com> wrote:
>>>>
>>>>Hi,
>>>>>The one you showed is:
>>>>>
>>>>>dat1$time_on_task<- c(diff(dat1$time),NA)
>>>>>
>>>>> dat1
>>>>>#  id  event       time time_on_task
>>>>>#1  1    add 1373502892           80
>>>>>
>>>>>#2  2    add 1373502972           23
>>>>>#3  3 delete 1373502995          901
>>>>>#4  4   view 1373503896          100
>>>>>#5  5    add 1373503996           NA
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>________________________________
>>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>>>>>
>>>>>To: arun <smartpink...@yahoo.com>
>>>>>Cc: R help <r-help@r-project.org>
>>>>>Sent: Thursday, August 29, 2013 2:15 PM
>>>>>Subject: Re: [R] Add new calculated column to data frame
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Thanks Arun,
>>>>>
>>>>>this is great. However, it should be just a little bit different:
>>>>>
>>>>>#  id  event       time time_on_task
>>>>>#1  1    add 1373502892           80
>>>>>#2  2    add 1373502972           23
>>>>>#3  3 delete 1373502995           901
>>>>>#4  4   view 1373503896          100
>>>>>#5  5    add 1373503996          NA
>>>>>
>>>>>
>>>>>When I calculate difference, I need to know how long each activity was. It 
>>>>>is id2-id1 for the first activity...
>>>>>
>>>>>
>>>>>
>>>>>On Thu, Aug 29, 2013 at 11:03 AM, arun <smartpink...@yahoo.com> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>>Hi,
>>>>>>Try:
>>>>>>dat1<- read.table(text="
>>>>>>id    event    time
>>>>>>
>>>>>>1    add      1373502892
>>>>>>2    add      1373502972
>>>>>>3    delete  1373502995
>>>>>>4    view      1373503896
>>>>>>5    add      1373503996
>>>>>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>>>> dat1$time_on_task<- c(NA,diff(dat1$time))
>>>>>> dat1
>>>>>>#  id  event       time time_on_task
>>>>>>#1  1    add 1373502892           NA
>>>>>>#2  2    add 1373502972           80
>>>>>>#3  3 delete 1373502995           23
>>>>>>#4  4   view 1373503896          901
>>>>>>#5  5    add 1373503996          100
>>>>>>
>>>>>>#Not sure whether this depends on the values of "event" or not..
>>>>>>A.K.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>----- Original Message -----
>>>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>>>>>>To: R help <R-help@r-project.org>
>>>>>>Cc:
>>>>>>Sent: Thursday, August 29, 2013 1:52 PM
>>>>>>Subject: [R] Add new calculated column to data frame
>>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>I have a following data set:
>>>>>>id    event    time (in sec)
>>>>>>1     add      1373502892
>>>>>>2     add      1373502972
>>>>>>3     delete   1373502995
>>>>>>4     view      1373503896
>>>>>>5     add       1373503996
>>>>>>...
>>>>>>
>>>>>>I'd like to add new column "time on task" which is time elapsed between 
>>>>>>two
>>>>>>events (id2 - id1...). What would be the best approach to do that?
>>>>>>
>>>>>>Thanks,
>>>>>>Srecko
>>>>>>
>>>>>>    [[alternative HTML version deleted]]
>>>>>>
>>>>>>______________________________________________
>>>>>>R-help@r-project.org mailing list
>>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>PLEASE do read the posting guide 
>>>>>>http://www.R-project.org/posting-guide.html
>>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to