On Tue, 2 Dec 2014, Marc Schwartz wrote:


On Dec 2, 2014, at 9:29 AM, Matthias Weber <matthias.we...@fntsoftware.com> 
wrote:

Hello together,

i have a data.frame with date-values. What I want is a data.frame with a 
several lines for each date.

My current data.frame looks like this one:

ID     FROM         TO                REASON
1      2015-02-27   2015-02-28    Holiday
1      2015-03-15   2015-03-20    Illness
2      2015-05-20   2015-02-23    Holiday
2      2015-06-01   2015-06-03    Holiday
2      2015-07-01   2015-07-01    Illness

The result looks like this one:

ID   DATE           REASON
1    2015-02-27    Holiday
1    2015-02-28    Holiday
1    2015-03-15    Illness
1    2015-03-16    Illness
1    2015-03-17    Illness
1    2015-03-18    Illness
1    2015-03-19    Illness
1    2015-03-20    Illness
2    2015-05-20   Holiday
2    2015-05-21   Holiday
2    2015-05-22   Holiday
2    2015-05-23   Holiday
2    2015-06-01   Holiday
2    2015-06-02   Holiday
2    2015-06-02   Holiday
2    2015-07-01   Illness

Maybe anyone can help me, how I can do this.

Thank you.

Best regards.

Mat


A quick and dirty approach.

First, note that in your source data frame, the TO value in the third row is 
incorrect. I changed it here:

DF
 ID       FROM         TO  REASON
1  1 2015-02-27 2015-02-28 Holiday
2  1 2015-03-15 2015-03-20 Illness
3  2 2015-05-20 2015-05-23 Holiday
4  2 2015-06-01 2015-06-03 Holiday
5  2 2015-07-01 2015-07-01 Illness

With that in place, you can use R's recycling of values to create multiple data 
frame rows from the date sequences and the single ID and REASON entries:

i <- 1

data.frame(ID = DF$ID[i], DATE = seq(DF$FROM[i], DF$TO[i], by = "day"), REASON 
= DF$REASON[i])
 ID       DATE  REASON
1  1 2015-02-27 Holiday
2  1 2015-02-28 Holiday


So just put that into an lapply() based loop, which returns a list:

DF.TMP <- lapply(seq(nrow(DF)),
                  function(i) data.frame(ID = DF$ID[i],
                                         DATE = seq(DF$FROM[i], DF$TO[i], by = 
"day"),
                                         REASON = DF$REASON[i]))

DF.TMP
[[1]]
 ID       DATE  REASON
1  1 2015-02-27 Holiday
2  1 2015-02-28 Holiday

[[2]]
 ID       DATE  REASON
1  1 2015-03-15 Illness
2  1 2015-03-16 Illness
3  1 2015-03-17 Illness
4  1 2015-03-18 Illness
5  1 2015-03-19 Illness
6  1 2015-03-20 Illness

[[3]]
 ID       DATE  REASON
1  2 2015-05-20 Holiday
2  2 2015-05-21 Holiday
3  2 2015-05-22 Holiday
4  2 2015-05-23 Holiday

[[4]]
 ID       DATE  REASON
1  2 2015-06-01 Holiday
2  2 2015-06-02 Holiday
3  2 2015-06-03 Holiday

[[5]]
 ID       DATE  REASON
1  2 2015-07-01 Illness


Then use do.call() on the result:

do.call(rbind, DF.TMP)
  ID       DATE  REASON
1   1 2015-02-27 Holiday
2   1 2015-02-28 Holiday
3   1 2015-03-15 Illness
4   1 2015-03-16 Illness
5   1 2015-03-17 Illness
6   1 2015-03-18 Illness
7   1 2015-03-19 Illness
8   1 2015-03-20 Illness
9   2 2015-05-20 Holiday
10  2 2015-05-21 Holiday
11  2 2015-05-22 Holiday
12  2 2015-05-23 Holiday
13  2 2015-06-01 Holiday
14  2 2015-06-02 Holiday
15  2 2015-06-03 Holiday
16  2 2015-07-01 Illness


See ?seq.Date for the critical step.

Regards,

Marc Schwartz

Same thing, with some optional syntactic sugar:

library(dplyr)
dta <- read.table( text=
"ID     FROM         TO                REASON
1      2015-02-27   2015-02-28    Holiday
1      2015-03-15   2015-03-20    Illness
2      2015-05-20   2015-05-23    Holiday
2      2015-06-01   2015-06-03    Holiday
2      2015-07-01   2015-07-01    Illness
", header=TRUE, as.is=TRUE )

# Wrap function sequence in parentheses so pipes can be at beginning
# of line
(     dta
      # data not provided using dput, so date columns are character
  %>% mutate( FROM = as.Date(FROM)
            , TO = as.Date(TO)
            )
      # process data frame one row at a time
  %>% rowwise
      # form a new data frame using each row, results automatically
      # rbind()ed
  %>% do( data.frame( ID=.$ID
                    , DATE=seq.Date( .$FROM, .$TO, by="day" )
                    , REASON=.$REASON
                    , stringsAsFactors=FALSE
                    )
        )
      # optionally drop "data frame features" provided by dplyr to get
      # comparable result as above
  %>% as.data.frame
)

Read the dplyr and magrittr package help files to learn more about this method of handling data. I think Marc's solution is worth understanding because that is really what dplyr is doing for you, but it can get tedious to do that whole process yourself day-in and day-out.

Dplyr can also be used in conjunction with data.tables package or SQL, which can be good if you have a lot of data to work with... again, just syntactic sugar, but convenient.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to