On Wed, 30 Oct 2013, sart...@voila.fr wrote:

Hi everyone,

I have a data frame with email addresses in the first column and in the second column a list of times (of different lengths) at which an email was sent from the user in the first column.

Here is an example of my data:

Email Email_sent
j...@doe.com "2013-09-26 15:59:55" "2013-09-27 09:48:29" "2013-09-27 10:00:02" "2013-09-27 10:12:54" j...@shoe.com "2013-09-26 09:50:28" "2013-09-26 14:41:24" "2013-09-26 14:51:36" "2013-09-26 17:50:10" "2013-09-27 13:34:02" "2013-09-27 14:41:10" "2013-09-27 15:37:36"
...

I cannot find any way to calculate the frequencies between each email sent for 
each user:
j...@doe.com 0.02 email / hour
j...@shoe.com 0.15 email / hour
...

Can anyone help me on this problem?

You could do something like this:

## scan your data file
d <- scan(<yourfile>, what = "character")

## here I use the data from above
d <- scan(textConnection('j...@doe.com "2013-09-26 15:59:55"
"2013-09-27 09:48:29" "2013-09-27 10:00:02" "2013-09-27 10:12:54"
j...@shoe.com "2013-09-26 09:50:28" "2013-09-26 14:41:24"
"2013-09-26 14:51:36" "2013-09-26 17:50:10" "2013-09-27 13:34:02"
"2013-09-27 14:41:10" "2013-09-27 15:37:36"'), what = "character")

## find position of e-mail addresses
n <- grep("@", dc, fixed = TRUE)

## extract list of dates
n <- c(n, length(d) + 1)
x <- lapply(1:(length(n) - 1),
  function(i) as.POSIXct(d[(n[i] + 1):(n[i+1] - 1)]))

## add e-mail addresses as names
names(x) <- d[head(n, -1)]

## functions that could extract quantities of interest such as
## number of mails per hour or mean time difference etc.
meantime <- function(timevec)
  mean(as.numeric(diff(timevec), units = "hours"))
numperhour <- function(timevec)
  length(timevec) / as.numeric(diff(range(timevec)), units = "hours")

## apply to full list
sapply(x, numperhour)
sapply(x, meantime)

## apply to list by date
sapply(x, function(timevec) tapply(timevec, as.Date(timevec), numperhour))
sapply(x, function(timevec) tapply(timevec, as.Date(timevec), meantime))

hth,
Z

The ultimate goal (which seems amibitious at this time) is to calculate, for each user, the frequencies between each mail per day, between the first email sent and the last email sent each day (to avoid taking nights into account), i.e.:

2013-09-26 2013-09-27
j...@doe.com 1.32 emails / hour 0.56 emails / hour
j...@shoe.com 10.57 emails / hour 2.54 emails / hour
...

At this time it seems pretty impossible, but I guess I will eventually find a 
way :-)

Thanks a lot,


Sartene Bel
R learner
___________________________________________________________
Qu'y a-t-il ce soir à la télé ? D'un coup d'?il, visualisez le programme sur 
Voila.fr http://tv.voila.fr/programmes/chaines-tnt/ce-soir.html

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to