On Tue, 2 Dec 2014, Marc Schwartz wrote:
On Dec 2, 2014, at 9:29 AM, Matthias Weber <matthias.we...@fntsoftware.com>
wrote:
Hello together,
i have a data.frame with date-values. What I want is a data.frame with a
several lines for each date.
My current data.frame looks like this one:
ID FROM TO REASON
1 2015-02-27 2015-02-28 Holiday
1 2015-03-15 2015-03-20 Illness
2 2015-05-20 2015-02-23 Holiday
2 2015-06-01 2015-06-03 Holiday
2 2015-07-01 2015-07-01 Illness
The result looks like this one:
ID DATE REASON
1 2015-02-27 Holiday
1 2015-02-28 Holiday
1 2015-03-15 Illness
1 2015-03-16 Illness
1 2015-03-17 Illness
1 2015-03-18 Illness
1 2015-03-19 Illness
1 2015-03-20 Illness
2 2015-05-20 Holiday
2 2015-05-21 Holiday
2 2015-05-22 Holiday
2 2015-05-23 Holiday
2 2015-06-01 Holiday
2 2015-06-02 Holiday
2 2015-06-02 Holiday
2 2015-07-01 Illness
Maybe anyone can help me, how I can do this.
Thank you.
Best regards.
Mat
A quick and dirty approach.
First, note that in your source data frame, the TO value in the third row is
incorrect. I changed it here:
DF
ID FROM TO REASON
1 1 2015-02-27 2015-02-28 Holiday
2 1 2015-03-15 2015-03-20 Illness
3 2 2015-05-20 2015-05-23 Holiday
4 2 2015-06-01 2015-06-03 Holiday
5 2 2015-07-01 2015-07-01 Illness
With that in place, you can use R's recycling of values to create multiple data
frame rows from the date sequences and the single ID and REASON entries:
i <- 1
data.frame(ID = DF$ID[i], DATE = seq(DF$FROM[i], DF$TO[i], by = "day"), REASON
= DF$REASON[i])
ID DATE REASON
1 1 2015-02-27 Holiday
2 1 2015-02-28 Holiday
So just put that into an lapply() based loop, which returns a list:
DF.TMP <- lapply(seq(nrow(DF)),
function(i) data.frame(ID = DF$ID[i],
DATE = seq(DF$FROM[i], DF$TO[i], by =
"day"),
REASON = DF$REASON[i]))
DF.TMP
[[1]]
ID DATE REASON
1 1 2015-02-27 Holiday
2 1 2015-02-28 Holiday
[[2]]
ID DATE REASON
1 1 2015-03-15 Illness
2 1 2015-03-16 Illness
3 1 2015-03-17 Illness
4 1 2015-03-18 Illness
5 1 2015-03-19 Illness
6 1 2015-03-20 Illness
[[3]]
ID DATE REASON
1 2 2015-05-20 Holiday
2 2 2015-05-21 Holiday
3 2 2015-05-22 Holiday
4 2 2015-05-23 Holiday
[[4]]
ID DATE REASON
1 2 2015-06-01 Holiday
2 2 2015-06-02 Holiday
3 2 2015-06-03 Holiday
[[5]]
ID DATE REASON
1 2 2015-07-01 Illness
Then use do.call() on the result:
do.call(rbind, DF.TMP)
ID DATE REASON
1 1 2015-02-27 Holiday
2 1 2015-02-28 Holiday
3 1 2015-03-15 Illness
4 1 2015-03-16 Illness
5 1 2015-03-17 Illness
6 1 2015-03-18 Illness
7 1 2015-03-19 Illness
8 1 2015-03-20 Illness
9 2 2015-05-20 Holiday
10 2 2015-05-21 Holiday
11 2 2015-05-22 Holiday
12 2 2015-05-23 Holiday
13 2 2015-06-01 Holiday
14 2 2015-06-02 Holiday
15 2 2015-06-03 Holiday
16 2 2015-07-01 Illness
See ?seq.Date for the critical step.
Regards,
Marc Schwartz
Same thing, with some optional syntactic sugar:
library(dplyr)
dta <- read.table( text=
"ID FROM TO REASON
1 2015-02-27 2015-02-28 Holiday
1 2015-03-15 2015-03-20 Illness
2 2015-05-20 2015-05-23 Holiday
2 2015-06-01 2015-06-03 Holiday
2 2015-07-01 2015-07-01 Illness
", header=TRUE, as.is=TRUE )
# Wrap function sequence in parentheses so pipes can be at beginning
# of line
( dta
# data not provided using dput, so date columns are character
%>% mutate( FROM = as.Date(FROM)
, TO = as.Date(TO)
)
# process data frame one row at a time
%>% rowwise
# form a new data frame using each row, results automatically
# rbind()ed
%>% do( data.frame( ID=.$ID
, DATE=seq.Date( .$FROM, .$TO, by="day" )
, REASON=.$REASON
, stringsAsFactors=FALSE
)
)
# optionally drop "data frame features" provided by dplyr to get
# comparable result as above
%>% as.data.frame
)
Read the dplyr and magrittr package help files to learn more about this
method of handling data. I think Marc's solution is worth understanding
because that is really what dplyr is doing for you, but it can get tedious
to do that whole process yourself day-in and day-out.
Dplyr can also be used in conjunction with data.tables package or SQL,
which can be good if you have a lot of data to work with... again, just
syntactic sugar, but convenient.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.