HI,
It's not really clear, but you can try this:
dat1<- read.table(text="
id module  event       time time_on_task Categ    url
  1    sys  login 1373502892           80     B    http://post/add?id=42&idp=45 
 2   task    add 1373502892           80     A     http://post/add?id=33&idp=45
 3   task    add 1373502972           23     A     http://post/add?id=34&idp=45
 4    sys  login 1373502892           80     B     http://post/add?id=39&idp=42
 5   list delete 1373502995          901     C     http://post/add?id=37&idp=41
 6   list   view 1373503896          100     D     http://post/add?id=36&idp=46
 7   task    add 1373503996           NA     A     http://post/add?id=31&idp=45
",sep="",header=TRUE,stringsAsFactors=FALSE)
vec1<-as.numeric(gsub(".*\\?.*=(\\d+)\\&.*","\\1",dat1$url[dat1$Categ=="A"]))
 vec1
#[1] 33 34 31

dat2<- read.table(text="
id idpost idtopic iduser
1   45      33       101
2   46      34       102
3   47      33       103
4   48      33       101
5   49      35       104
",sep="",header=TRUE)
 dat1$Categ[dat1$Categ=="A"][!vec1%in%dat2$idtopic]<-"F"
 dat1
#  id module  event       time time_on_task Categ                          url
#1  1    sys  login 1373502892           80     B http://post/add?id=42&idp=45
#2  2   task    add 1373502892           80     A http://post/add?id=33&idp=45
#3  3   task    add 1373502972           23     A http://post/add?id=34&idp=45
#4  4    sys  login 1373502892           80     B http://post/add?id=39&idp=42
#5  5   list delete 1373502995          901     C http://post/add?id=37&idp=41
#6  6   list   view 1373503896          100     D http://post/add?id=36&idp=46
#7  7   task    add 1373503996           NA     F http://post/add?id=31&idp=45


A.K.






________________________________
From: srecko joksimovic <sreckojoksimo...@gmail.com>
To: arun <smartpink...@yahoo.com> 
Sent: Thursday, August 29, 2013 5:38 PM
Subject: Re: [R] Add new calculated column to data frame



Hi Arun,

I really appreciate your help, and we did a great job :)
but, now I think that R can do anything, so I'd like to try one more thing, if 
you don't mind...

from the table with categories, 

#  id module  event       time time_on_task Categ    url
#1  1    sys  login 1373502892           80     B         http:
#2  2   task    add 1373502892           80     A         http:
#3  3   task    add 1373502972           23     A         http:
#4  4    sys  login 1373502892           80     B          http:
#5  5   list delete 1373502995          901     C
#6  6   list   view 1373503896          100     D
#7  7   task    add 1373503996           NA     A


I'd like to use only certain category (for example A). Each of these fields has 
an url whose format is something like http://post/add?id=33&idp=45. First step 
would be to extract this id (33 in this case). Based on that value, I want to 
find all "iduser" from the following table:

id idpost idtopic iduser
1   45      33       101
2   46      34       102

3   47      33       103

4   48      33       101

5   49      35       104


The next step would be to check if at least one of these values (iduser) is not 
in the vectors "users" (only ids). If that is the case, I want to change 
category to F, if not, I want to keep the same category.

If this is too much for one question, I'll implement this in Java, but I'd 
really like to try this with R. Maybe this id extraction from url is the most 
important problem... I tried most of these steps, but still not able to put 
them all together...

Thank you so much for your time.
Srecko








On Thu, Aug 29, 2013 at 12:22 PM, arun <smartpink...@yahoo.com> wrote:

Hi Srecko,
>No problem.
>
>Arun
>
>
>
>
>
>
>________________________________
>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>To: arun <smartpink...@yahoo.com>
>Sent: Thursday, August 29, 2013 3:19 PM
>
>Subject: Re: [R] Add new calculated column to data frame
>
>
>
>This is great Arun, thank you again.
>
>I was thinking to use sqldf and issue query for each module-action 
>combination, but this is much better. Since I have table with categories 
>(module, action, category), I could create vector "levels" based on the first 
>two columns and vector "labels" based on the category column and that should 
>to the work...
>
>Best,
>Srecko
>
>
>
>On Thu, Aug 29, 2013 at 12:16 PM, arun <smartpink...@yahoo.com> wrote:
>
>Hi Srecko,
>>
>>You didn't mention the order in which the letters are assigned.  If you need 
>>a different order, just change the order in the ",levels=c(....),".
>>Arun
>>
>>
>>
>>
>>----- Original Message -----
>>From: arun <smartpink...@yahoo.com>
>>To: srecko joksimovic <sreckojoksimo...@gmail.com>
>>Cc: R help <r-help@r-project.org>
>>
>>Sent: Thursday, August 29, 2013 3:13 PM
>>Subject: Re: [R] Add new calculated column to data frame
>>
>>
>>
>>Hi,
>>You could try this:
>>dat1<- read.table(text="
>>id  module    event       time                       time_on_task
>>1   sys         login         1373502892           80
>>2   task        add          1373502892           80
>>3   task        add          1373502972           23
>>4   sys         login         1373502892           80
>>5   list         delete       1373502995          901
>>6   list          view         1373503896          100
>>7   task        add          1373503996           NA
>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>> 
>>dat1$Categ<-as.character(factor(with(dat1,paste(module,event,sep="_")),levels=c("task_add","sys_login","list_delete","list_view"),labels=LETTERS[1:4]))
>>
>>
>>dat1
>>#  id module  event       time time_on_task Categ
>>#1  1    sys  login 1373502892           80     B
>>#2  2   task    add 1373502892           80     A
>>#3  3   task    add 1373502972           23     A
>>#4  4    sys  login 1373502892           80     B
>>#5  5   list delete 1373502995          901     C
>>#6  6   list   view 1373503896          100     D
>>#7  7   task    add 1373503996           NA     A
>>A.K.
>>
>>________________________________
>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>>To: arun <smartpink...@yahoo.com>
>>Cc: R help <R-help@r-project.org>
>>Sent: Thursday, August 29, 2013 2:34 PM
>>Subject: Re: [R] Add new calculated column to data frame
>>
>>
>>
>>Hi Arun,
>>
>>There is one more question... you explained me how to use 
>>split(dat1,cumsum(dat1$action=="login")) in one of previous questions, and 
>>that is great.
>>Now, if I have something like this:
>>
>>id  module    event       time                       time_on_task
>>1   sys         login         1373502892           80
>>2   task        add          1373502892           80
>>
>>3   task        add          1373502972           23
>>4   sys         login         1373502892           80
>>5   list         delete       1373502995          901
>>6   list          view         1373503896          100
>>7   task        add          1373503996           NA
>>I know how to split at each "login" occurrence, and I know how to add new 
>>column with time differences. But, how to add new column "category" which 
>>will be calculated based on columns "module" and "even"? For example if 
>>module=task and event=add => category= A...
>>
>>Srecko
>>
>>
>>
>>
>>
>>On Thu, Aug 29, 2013 at 11:22 AM, arun <smartpink...@yahoo.com> wrote:
>>
>>Hi Srecko,
>>>No problem.
>>>Regards,
>>>Arun
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>________________________________
>>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>>>To: arun <smartpink...@yahoo.com>
>>>Sent: Thursday, August 29, 2013 2:22 PM
>>>
>>>Subject: Re: [R] Add new calculated column to data frame
>>>
>>>
>>>
>>>Sorry... I should figure it out...
>>>
>>>thanks so much!
>>>Srecko
>>>
>>>
>>>
>>>On Thu, Aug 29, 2013 at 11:21 AM, arun <smartpink...@yahoo.com> wrote:
>>>
>>>Hi,
>>>>The one you showed is:
>>>>
>>>>dat1$time_on_task<- c(diff(dat1$time),NA)
>>>>
>>>> dat1
>>>>#  id  event       time time_on_task
>>>>#1  1    add 1373502892           80
>>>>
>>>>#2  2    add 1373502972           23
>>>>#3  3 delete 1373502995          901
>>>>#4  4   view 1373503896          100
>>>>#5  5    add 1373503996           NA
>>>>
>>>>
>>>>
>>>>
>>>>________________________________
>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>>>>
>>>>To: arun <smartpink...@yahoo.com>
>>>>Cc: R help <r-help@r-project.org>
>>>>Sent: Thursday, August 29, 2013 2:15 PM
>>>>Subject: Re: [R] Add new calculated column to data frame
>>>>
>>>>
>>>>
>>>>
>>>>Thanks Arun,
>>>>
>>>>this is great. However, it should be just a little bit different:
>>>>
>>>>#  id  event       time time_on_task
>>>>#1  1    add 1373502892           80
>>>>#2  2    add 1373502972           23
>>>>#3  3 delete 1373502995           901
>>>>#4  4   view 1373503896          100
>>>>#5  5    add 1373503996          NA
>>>>
>>>>
>>>>When I calculate difference, I need to know how long each activity was. It 
>>>>is id2-id1 for the first activity...
>>>>
>>>>
>>>>
>>>>On Thu, Aug 29, 2013 at 11:03 AM, arun <smartpink...@yahoo.com> wrote:
>>>>
>>>>
>>>>>
>>>>>Hi,
>>>>>Try:
>>>>>dat1<- read.table(text="
>>>>>id    event    time
>>>>>
>>>>>1    add      1373502892
>>>>>2    add      1373502972
>>>>>3    delete  1373502995
>>>>>4    view      1373503896
>>>>>5    add      1373503996
>>>>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>>> dat1$time_on_task<- c(NA,diff(dat1$time))
>>>>> dat1
>>>>>#  id  event       time time_on_task
>>>>>#1  1    add 1373502892           NA
>>>>>#2  2    add 1373502972           80
>>>>>#3  3 delete 1373502995           23
>>>>>#4  4   view 1373503896          901
>>>>>#5  5    add 1373503996          100
>>>>>
>>>>>#Not sure whether this depends on the values of "event" or not..
>>>>>A.K.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>----- Original Message -----
>>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
>>>>>To: R help <R-help@r-project.org>
>>>>>Cc:
>>>>>Sent: Thursday, August 29, 2013 1:52 PM
>>>>>Subject: [R] Add new calculated column to data frame
>>>>>
>>>>>Hi,
>>>>>
>>>>>I have a following data set:
>>>>>id    event    time (in sec)
>>>>>1     add      1373502892
>>>>>2     add      1373502972
>>>>>3     delete   1373502995
>>>>>4     view      1373503896
>>>>>5     add       1373503996
>>>>>...
>>>>>
>>>>>I'd like to add new column "time on task" which is time elapsed between two
>>>>>events (id2 - id1...). What would be the best approach to do that?
>>>>>
>>>>>Thanks,
>>>>>Srecko
>>>>>
>>>>>    [[alternative HTML version deleted]]
>>>>>
>>>>>______________________________________________
>>>>>R-help@r-project.org mailing list
>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>PLEASE do read the posting guide 
>>>>>http://www.R-project.org/posting-guide.html
>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>
>>>
>>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to