I have a drug-trial study dataset (attached image).

Since its a large and complex dataset (at least to me) and I hope to be as 
clear as possible with my question.
The dataset is from a study where individuals are given drugs and followed up 
over a period spanning two consecutive years. Individuals do not start 
treatment on the same day and once they start, the variable "drug-admin" is 
marked "x" as well as the time they stop treatment in the following year.
There exists another variable, "study_id", that I hope to populate as can be 
seen in the dataset, with the following conditions:

For every individual
•    if the individual has entries that show they received drugs both on the 
start and end date (marked with the "x")
•    if the start of drug administration falls in month == 2 | 3 and end of 
administration falls in month == 2 | 4
•    then, using the date that marks the start of drug administration, populate 
the variable _"study_id"_ in all the rows that fall within the timeframe that 
the individual was given drugs but excluding the end of drug administration.
I have tried my level best and while I have explored several examples online, I 
haven't managed to solve this. The dataset contains close to 6000 individuals 
spanning 10 years and my best bet was to use a loop which keeps crushing R 
after running for close to 30min. I have also read that dplyr may do the job 
but my attempts have been in vain.

sample code
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
individual <- unique (df$ID)  #vector of individuals
datalength <- dim(df)[1]      #number of rows in dataframe

for (i in 1:length(individual)) {
  for (j in 1:datalength) {
    start_admin <- df[(df$year == 2007] & df$drug_admin == "x" & c(df$month == 
2 | df$month == 3),1]  #capture date of start
    end_admin <- df[(df$year == 2008] & df$drug_admin == "x" & c(df$month == 2 
| df$month == 4),1]    #capture date of end

    if(df[datalength,1] == individual(i) & df[datalength,2] >= start_admin & 
df[datalength,2] < end_admin) {
      df[datalength,6] <- start_admin #populate respective row if condition is 
met
      }
    }
  }

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

Above is the code that keeps failing..

Any help is highly appreciated....


______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for 
the use of the named recipient. If you have received this e-mail in error, 
please let us know by replying to the sender, and immediately delete it from 
your system.  Please note, that in these circumstances, the use, disclosure, 
distribution or copying of this information is strictly prohibited. 
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  
accuracy or completeness of this message as it has been transmitted over a 
public network. Although the Programme has taken reasonable precautions to 
ensure no viruses are present in emails, it cannot accept responsibility for 
any loss or damage arising from the use of the email or attachments. Any views 
expressed in this message are those of the individual sender, except where the 
sender specifically states them to be the views of KEMRI-Wellcome Trust 
Programme.
______________________________________________________________________
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to