On Apr 4, 2012, at 8:19 AM, Petr PIKAL wrote:

Hi


Dear Petr,

thanks for taking your time.

For this input, the first element should be selected since there are
more
than 3 more dates within one year (basically, all other dates are within

one year) and at least one of them is more than 3 month later.

In the meantime, I came up with some code (probably) doing what I want:

identify_first_date = function(dates)
{
within_one_year = as.matrix(dist(dates)) < 366 ### next

dates in same year?
within_one_year[upper.tri(within_one_year, diag=TRUE)]=FALSE

within_one_month = as.matrix(dist(dates)) < 91 ### next
dates within 90 days?
within_one_month[upper.tri(within_one_month, diag=TRUE)]=FALSE

dates[
  which(
  apply(within_one_year,2,sum) > apply(within_one_month,2,sum) &
### more dates in one year than in one month
  apply(within_one_year,2,sum) >=3                   ### more than 4
dates in one year
  )[1]]
}

I guess, the code could be improved, though, it takes some time.

Your first condition can be fulfilled by

c(as.numeric(diff(dates))<365, F) > c(as.numeric(diff(dates))<91,F))

so if you put in your function

identify_first_date2 = function(dates)
{
within_one_year = as.matrix(dist(dates)) < 366
within_one_year[upper.tri(within_one_year, diag=TRUE)]=FALSE

distance<-as.numeric(diff(dates))

dates[ which( c(distance<365, F) > c(distance<91,F) &
apply(within_one_year,2,sum) >=3)[1]]
}

You shall get some improvement, however I am still struggling to evaluate
how many consecutive dates are within one year.


I added a couple of dates to the test case on which my original erroneous sugegstion failed:

 dput(dates)
structure(c(11323, 11325, 11334, 11335, 11432, 11688, 12418), class = "Date")

This returns a list of "intervals" or perhaps "stretches" (?) spanning less than 365 days to assemble candidates for the first criterion:

intervals1 <- lapply(1:(length(dates)-4) , function(x) dates[which(dates - dates[x] < 365 & dates - dates[x] >=0)] )
> intervals1
[[1]]
[1] "2001-01-01" "2001-01-03" "2001-01-12" "2001-01-13" "2001-04-20"

[[2]]
[1] "2001-01-03" "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"

[[3]]
[1] "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"


This then test whether the second to last element (the "penultimate" one in correct use of that often misused term) is at least 90 days out:

> sapply(intervals1, function(x) x[length(x)-1] - x[1] >= 90)
[1] FALSE  TRUE  TRUE
> intervals1[which( sapply(intervals1, function(x) x[length(x)-1] - x[1] >90)) ]
[[1]]
[1] "2001-01-03" "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"

[[2]]
[1] "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"


And this returns the starting date from that result:

> intervals1[which( sapply(intervals1, function(x) x[length(x)-1] - x[1] >90)) ][[1]][1]
[1] "2001-01-03"

I see that I should have added a test for length greater than 3 but that should not be difficult.

> intervals1[which( sapply(intervals1,
       function(x) x[length(x)-1] - x[1] >90 & length(x) >3)) ][[1]][1]
[1] "2001-01-03"


--
David.





Best,
Felix


-----Ursprüngliche Nachricht-----
Von: Petr PIKAL [mailto:petr.pi...@precheza.cz]
Gesendet: Mittwoch, 4. April 2012 09:47
An: Fischer, Felix
Cc: r-help@r-project.org
Betreff: Odp: [R] identify time span in date vector

Hi

Can you please be more specific? Based on this input, what do you want
as a result?

set.seed(111)
dates = as.Date(sort(rnorm(10,3000,100)), origin = "2000-1-1") dates
[1] "2007-08-01" "2007-10-21" "2007-12-08" "2007-12-15" "2008-01-29"
"2008-02-14" "2008-02-16" "2008-03-01"
[9] "2008-04-02" "2008-04-11"


Regards
Petr


Hello everyone,

i try to identify the first element of a date vector, for which the
following condition holds: at least 3 more dates within the next 365
days,
but at least one of these must be between 3-12 month later.

dates = as.Date(sort(rnorm(10,3000,100)), origin = "2000-1-1")

Has anyone an idea how to do this economically? I'll need to apply
this
to
a large dataset with date vectors of various lengths and I can think
only
of quite difficult algorithms :(

Any ideas would be appreciated,
Felix


  [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to