On Apr 4, 2012, at 8:19 AM, Petr PIKAL wrote:
Hi
Dear Petr,
thanks for taking your time.
For this input, the first element should be selected since there are
more
than 3 more dates within one year (basically, all other dates are
within
one year) and at least one of them is more than 3 month later.
In the meantime, I came up with some code (probably) doing what I
want:
identify_first_date = function(dates)
{
within_one_year = as.matrix(dist(dates)) < 366 ###
next
dates in same year?
within_one_year[upper.tri(within_one_year, diag=TRUE)]=FALSE
within_one_month = as.matrix(dist(dates)) < 91 ###
next
dates within 90 days?
within_one_month[upper.tri(within_one_month, diag=TRUE)]=FALSE
dates[
which(
apply(within_one_year,2,sum) > apply(within_one_month,2,sum) &
### more dates in one year than in one month
apply(within_one_year,2,sum) >=3 ### more than 4
dates in one year
)[1]]
}
I guess, the code could be improved, though, it takes some time.
Your first condition can be fulfilled by
c(as.numeric(diff(dates))<365, F) > c(as.numeric(diff(dates))<91,F))
so if you put in your function
identify_first_date2 = function(dates)
{
within_one_year = as.matrix(dist(dates)) < 366
within_one_year[upper.tri(within_one_year, diag=TRUE)]=FALSE
distance<-as.numeric(diff(dates))
dates[ which( c(distance<365, F) > c(distance<91,F) &
apply(within_one_year,2,sum) >=3)[1]]
}
You shall get some improvement, however I am still struggling to
evaluate
how many consecutive dates are within one year.
I added a couple of dates to the test case on which my original
erroneous sugegstion failed:
dput(dates)
structure(c(11323, 11325, 11334, 11335, 11432, 11688, 12418), class =
"Date")
This returns a list of "intervals" or perhaps "stretches" (?) spanning
less than 365 days to assemble candidates for the first criterion:
intervals1 <- lapply(1:(length(dates)-4) , function(x)
dates[which(dates - dates[x] < 365 & dates - dates[x] >=0)] )
> intervals1
[[1]]
[1] "2001-01-01" "2001-01-03" "2001-01-12" "2001-01-13" "2001-04-20"
[[2]]
[1] "2001-01-03" "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"
[[3]]
[1] "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"
This then test whether the second to last element (the "penultimate"
one in correct use of that often misused term) is at least 90 days out:
> sapply(intervals1, function(x) x[length(x)-1] - x[1] >= 90)
[1] FALSE TRUE TRUE
> intervals1[which( sapply(intervals1, function(x) x[length(x)-1] -
x[1] >90)) ]
[[1]]
[1] "2001-01-03" "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"
[[2]]
[1] "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"
And this returns the starting date from that result:
> intervals1[which( sapply(intervals1, function(x) x[length(x)-1] -
x[1] >90)) ][[1]][1]
[1] "2001-01-03"
I see that I should have added a test for length greater than 3 but
that should not be difficult.
> intervals1[which( sapply(intervals1,
function(x) x[length(x)-1] - x[1] >90 & length(x) >3)) ][[1]][1]
[1] "2001-01-03"
--
David.
Best,
Felix
-----Ursprüngliche Nachricht-----
Von: Petr PIKAL [mailto:petr.pi...@precheza.cz]
Gesendet: Mittwoch, 4. April 2012 09:47
An: Fischer, Felix
Cc: r-help@r-project.org
Betreff: Odp: [R] identify time span in date vector
Hi
Can you please be more specific? Based on this input, what do you
want
as a result?
set.seed(111)
dates = as.Date(sort(rnorm(10,3000,100)), origin = "2000-1-1") dates
[1] "2007-08-01" "2007-10-21" "2007-12-08" "2007-12-15" "2008-01-29"
"2008-02-14" "2008-02-16" "2008-03-01"
[9] "2008-04-02" "2008-04-11"
Regards
Petr
Hello everyone,
i try to identify the first element of a date vector, for which the
following condition holds: at least 3 more dates within the next 365
days,
but at least one of these must be between 3-12 month later.
dates = as.Date(sort(rnorm(10,3000,100)), origin = "2000-1-1")
Has anyone an idea how to do this economically? I'll need to apply
this
to
a large dataset with date vectors of various lengths and I can think
only
of quite difficult algorithms :(
Any ideas would be appreciated,
Felix
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.