On Wed, 30 Oct 2013, [email protected] wrote:

Hi everyone,

I have a data frame with email addresses in the first column and in the second column a list of times (of different lengths) at which an email was sent from the user in the first column.

Here is an example of my data:

Email Email_sent
[email protected] "2013-09-26 15:59:55" "2013-09-27 09:48:29" "2013-09-27 10:00:02" "2013-09-27 10:12:54" [email protected] "2013-09-26 09:50:28" "2013-09-26 14:41:24" "2013-09-26 14:51:36" "2013-09-26 17:50:10" "2013-09-27 13:34:02" "2013-09-27 14:41:10" "2013-09-27 15:37:36"
...

I cannot find any way to calculate the frequencies between each email sent for 
each user:
[email protected] 0.02 email / hour
[email protected] 0.15 email / hour
...

Can anyone help me on this problem?

You could do something like this:

## scan your data file
d <- scan(<yourfile>, what = "character")

## here I use the data from above
d <- scan(textConnection('[email protected] "2013-09-26 15:59:55"
"2013-09-27 09:48:29" "2013-09-27 10:00:02" "2013-09-27 10:12:54"
[email protected] "2013-09-26 09:50:28" "2013-09-26 14:41:24"
"2013-09-26 14:51:36" "2013-09-26 17:50:10" "2013-09-27 13:34:02"
"2013-09-27 14:41:10" "2013-09-27 15:37:36"'), what = "character")

## find position of e-mail addresses
n <- grep("@", dc, fixed = TRUE)

## extract list of dates
n <- c(n, length(d) + 1)
x <- lapply(1:(length(n) - 1),
  function(i) as.POSIXct(d[(n[i] + 1):(n[i+1] - 1)]))

## add e-mail addresses as names
names(x) <- d[head(n, -1)]

## functions that could extract quantities of interest such as
## number of mails per hour or mean time difference etc.
meantime <- function(timevec)
  mean(as.numeric(diff(timevec), units = "hours"))
numperhour <- function(timevec)
  length(timevec) / as.numeric(diff(range(timevec)), units = "hours")

## apply to full list
sapply(x, numperhour)
sapply(x, meantime)

## apply to list by date
sapply(x, function(timevec) tapply(timevec, as.Date(timevec), numperhour))
sapply(x, function(timevec) tapply(timevec, as.Date(timevec), meantime))

hth,
Z

The ultimate goal (which seems amibitious at this time) is to calculate, for each user, the frequencies between each mail per day, between the first email sent and the last email sent each day (to avoid taking nights into account), i.e.:

2013-09-26 2013-09-27
[email protected] 1.32 emails / hour 0.56 emails / hour
[email protected] 10.57 emails / hour 2.54 emails / hour
...

At this time it seems pretty impossible, but I guess I will eventually find a 
way :-)

Thanks a lot,


Sartene Bel
R learner
___________________________________________________________
Qu'y a-t-il ce soir à la télé ? D'un coup d'?il, visualisez le programme sur 
Voila.fr http://tv.voila.fr/programmes/chaines-tnt/ce-soir.html

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to