Try this, but I only get 2 changes for CB27A instead of you indicated 3:

> require(data.table)
> x <- read.table(text = "CASE_ID YEAR_MTH ATT_1
+ CB26A    201302         1
+ CB26A    201302         0
+ CB26A    201302         0
+ CB26A    201303         1
+ CB26A    201303         1
+ CB26A    201304         0
+ CB26A    201305         1
+ CB26A    201305         0
+ CB26A    201306         1
+ CB27A    201304         0
+ CB27A    201304         0
+ CB27A    201305         1
+ CB27A    201306         1
+ CB27A    201306         0
+ CB27A    201307         0
+ CB27A    201308         1", header = TRUE, as.is = TRUE)
> setDT(x)
> # convert to a Date object for comparison
> x[, MYD := as.Date(paste0(YEAR_MTH, '01'), format = "%Y%m%d")]
> # separate by CASE_ID and only keep the first 3 months
> x[
+     , {
+         # determine the end date as 3 months from the first date
+         endDate <- seq(MYD[1L], by = '3 months', length = 2)[2L]
+         # extract what is changing
+         changes <- ATT_1[(MYD >= MYD[1L]) & (MYD <= endDate)]
+         # now count the changes
+         list(nChanges = sum(head(changes, -1L) != tail(changes, -1L)))
+       }
+     , by = CASE_ID
+     ]
   CASE_ID nChanges
1:   CB26A        5
2:   CB27A        2

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Jul 30, 2014 at 3:08 AM, Abhinaba Roy <abhinabaro...@gmail.com> wrote:
> Dear R-helpers,
>
> I want to count the number of times ATT_1 has changed in a period of 3
> months(can be 4months) from the first YEAR_MTH entry for a CASE_ID. So if
> for a CASE_ID we have data only for two distinct YEAR_MTH, then all the
> entries should be considered, otherwise only the relevant entries will be
> considered for calculation.
> E.g. if the first YEAR_MTH entry is 201304 then get the number of changes
> till 201307(inclusive), similarly if the first YEAR_MTH entry is 201302
> then get the number of changes till 201305.
>
> Dataset
> CASE_ID YEAR_MTH ATT_1
> CB26A    201302         1
> CB26A    201302         0
> CB26A    201302         0
> CB26A    201303         1
> CB26A    201303         1
> CB26A    201304         0
> CB26A    201305         1
> CB26A    201305         0
> CB26A    201306         1
> CB27A    201304         0
> CB27A    201304         0
> CB27A    201305         1
> CB27A    201306         1
> CB27A    201306         0
> CB27A    201307         0
> CB27A    201308         1
>
> The final dataset should look like
>
> ID_CASE    No.of changes
> CB26A        5
> CB27A        3
>
> where 'No.of changes' refer to the change in 3 months (201302-201305 for
> CB26A and 201304-201307 for CB27A).
>
> How can this be done in R?
>
> Regards,
> Abhinaba Roy
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to