Your goal of putting character representations of dates in certain rows of a column is hard to imagine a use for. Your goal of identifying start and end dates seems reasonable enough. It can be accomplished using aggregate from base R (less external dependency) or summarise from dplyr (faster, simpler syntax):
result <- setNames( data.frame( aggregate( date~ID, data=drug_study, FUN=min ), aggregate( date~ID, data=drug_study, FUN=max )[2] ), c( "ID", "start", "end" ) ) or library( dplyr ) result <- ( drug_study %>% group_by( ID ) %>% summarise( start=min( date ), end=max( date) ) ) -- Sent from my phone. Please excuse my brevity. On July 3, 2016 5:19:01 AM PDT, Kevin Wamae <kwa...@kemri-wellcome.org> wrote: >Hi John, attached is the file in txt. Kindly let me know if it fails >again.. > >Regards >------------------------------------------------------------------------------- >Kevin Wame | Ph.D. Student (IDeAL) >KEMRI-Wellcome Trust Collaborative Research Programme >Centre for Geographic Medicine Research >P.O. Box 230-80108, Kilifi, Kenya > > >On 7/3/16, 3:16 PM, "John Kane" <jrkrid...@inbox.com> wrote: > >The data set did not show up. The R-help list tends to strip out most >file types as a safety precaution. Try renaming the file from xxx.csv >to xxx.txt and it should come through alright. > > > >John Kane >Kingston ON Canada > > >> -----Original Message----- >> From: kwa...@kemri-wellcome.org >> Sent: Sun, 3 Jul 2016 09:39:59 +0000 >> To: jdnew...@dcn.davis.ca.us, r-help@r-project.org >> Subject: Re: [R] R - Populate Another Variable Based on Multiple >> Conditions | For a Large Dataset >> >> Hi Jeff, pardon me, I was surely not making it easy. I hope this time >I >> will ☺ >> >> Attached is snippet of the dataset in csv format and below is the >> R.script I have managed so far. >> >> >----------------------------------------------------------------------------------------------------------------------------------------------- >> >----------------------------------------------------------------------------------------------------------------------------------------------- >> >> drug_study <- read.csv("drug_study.csv", header = T); >head(drug_study) >> drug_study$date <- as.Date(drug_study$date, "%m/%d/%Y") >> drug_study$study_id <- "" #create new column >> >> individual <- unique (drug_study$ID) #vector of individuals >> datalength <- dim(drug_study)[1] #number of rows in dataframe >> >> for (i in 1:length(individual)) { >> for (j in 1:datalength) { >> start_admin <- drug_study[c(drug_study$ID == individual[i] & >> drug_study$year == 2007 & drug_study$drug_admin == "Y" & >drug_study$month >> == 5),2] #capture date of start >> end_admin <- drug_study[(drug_study$ID == individual[i] & >> drug_study$year == 2008 & drug_study$drug_admin == "Y" & >drug_study$month >> == 2),2] #capture date of end >> >> if(drug_study[j,1] == individual[i] & drug_study[j,2] >= >start_admin >> & drug_study[j,2] < end_admin) { >> drug_study[j,6] <- paste(start_admin) #populate respective row >if >> condition is met >> } >> } >> } >> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> For this dataset, there exists three individuals, J1/3, R1/3, R10/1. >> >> The script works for the last two individuals but not J1/3 with the >error >> below: >> >> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Error in if (drug_study[j, 1] == individual[i] & drug_study[j, 2] >= >> start_admin & : >> argument is of length zero >> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> I figured it’s because this individuals start_admin and end_admin >dates >> aren’t captured because the if-loop fails. There’s my first problem, >> there are thousands of individuals with varying >> start_admin and end_admin dates and I need a script to capture these >for >> every individual. >> >> Secondly, the above script is taking almost an hour to run for the >entire >> dataset, just for the individuals whose start_admin and end_admin >dates >> can be captured by the if-loop. >> >> I need help in coming up with a script that will tackle the problem >> taking into account the different start_admin and end_admin dates and >be >> resourceful with regards to time. >> >> Regards >> >------------------------------------------------------------------------------- >> Kevin Kariuki >> >> >############################################################################################################################################### >> >############################################################################################################################################### >> >> On 7/3/16, 8:42 AM, "Jeff Newmiller" <jdnew...@dcn.davis.ca.us> >wrote: >> >> You are making this hard on yourself by not paying attention the >Posting >> Guide listed in the footer of every email on this list. You would >> probably also find [1] helpful also. >> >> [1] >> >http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >> -- >> Sent from my phone. Please excuse my brevity. >> >> On July 2, 2016 3:41:07 PM PDT, Kevin Wamae ><kwa...@kemri-wellcome.org> >> wrote: >> >Hi Jeff, sorry for referring to you as Jennifer earlier, accept my >> >apologies. >>> >> >I attached a sample dataset in the question, am afraid it must have >> >failed to attach. >>> >> >I have attached it again.. >>> >>> >> >Regards >> >>------------------------------------------------------------------------------- >> >Kevin Kariuki >>> >>> >> >On 7/2/16, 7:37 PM, "Jeff Newmiller" <jdnew...@dcn.davis.ca.us> >wrote: >>> >> >I can understand you not wanting to supply your actual data online, >but >> >only you know what your data looks like so only you can create a >> >simulated data set that we could show you how to work with. >> >-- >> >Sent from my phone. Please excuse my brevity. >>> >> >On July 2, 2016 2:57:39 AM PDT, Kevin Wamae ><kwa...@kemri-wellcome.org> >> >wrote: >> >>I have a drug-trial study dataset (attached image). >>>> >> >>Since its a large and complex dataset (at least to me) and I hope >to >> >be >> >>as clear as possible with my question. >> >>The dataset is from a study where individuals are given drugs and >> >>followed up over a period spanning two consecutive years. >Individuals >> >>do not start treatment on the same day and once they start, the >> >>variable "drug-admin" is marked "x" as well as the time they stop >> >>treatment in the following year. >> >>There exists another variable, "study_id", that I hope to populate >as >> >>can be seen in the dataset, with the following conditions: >>>> >> >>For every individual >> >>• if the individual has entries that show they received drugs >both >> >>on the start and end date (marked with the "x") >> >>• if the start of drug administration falls in month == 2 | 3 >and >> >>end of administration falls in month == 2 | 4 >> >>• then, using the date that marks the start of drug >administration, >> >>populate the variable _"study_id"_ in all the rows that fall within >> >the >> >>timeframe that the individual was given drugs but excluding the end >of >> >>drug administration. >> >>I have tried my level best and while I have explored several >examples >> >>online, I haven't managed to solve this. The dataset contains close >to >> >>6000 individuals spanning 10 years and my best bet was to use a >loop >> >>which keeps crushing R after running for close to 30min. I have >also >> >>read that dplyr may do the job but my attempts have been in vain. >>>> >> >>sample code >> >>>------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >>individual <- unique (df$ID) #vector of individuals >> >>datalength <- dim(df)[1] #number of rows in dataframe >>>> >> >>for (i in 1:length(individual)) { >>>> for (j in 1:datalength) { >> >>start_admin <- df[(df$year == 2007] & df$drug_admin == "x" & >> >c(df$month >> >>== 2 | df$month == 3),1] #capture date of start >> >>end_admin <- df[(df$year == 2008] & df$drug_admin == "x" & >c(df$month >> >>== 2 | df$month == 4),1] #capture date of end >>>> >> >>if(df[datalength,1] == individual(i) & df[datalength,2] >= >start_admin >> >>& df[datalength,2] < end_admin) { >> >>df[datalength,6] <- start_admin #populate respective row if >condition >> >>is met >>>> } >>>> } >>>> } >>>> >> >>>------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>> >> >>Above is the code that keeps failing.. >>>> >> >>Any help is highly appreciated.... >>>> >>>> >> >>>______________________________________________________________________ >>>> >> >>This e-mail contains information which is confidential. It is >intended >> >>only for the use of the named recipient. If you have received this >> >>e-mail in error, please let us know by replying to the sender, and >> >>immediately delete it from your system. Please note, that in these >> >>circumstances, the use, disclosure, distribution or copying of this >> >>information is strictly prohibited. KEMRI-Wellcome Trust Programme >> >>cannot accept any responsibility for the accuracy or completeness >of >> >>this message as it has been transmitted over a public network. >> >Although >> >>the Programme has taken reasonable precautions to ensure no viruses >> >are >> >>present in emails, it cannot accept responsibility for any loss or >> >>damage arising from the use of the email or attachments. Any views >> >>expressed in this message are those of the individual sender, >except >> >>where the sender specifically states them to be the views of >> >>KEMRI-Wellcome Trust Programme. >> >>>______________________________________________________________________ >>>> >>>> >> >>>------------------------------------------------------------------------ >>>> >> >>______________________________________________ >> >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >>https://stat.ethz.ch/mailman/listinfo/r-help >> >>PLEASE do read the posting guide >> >>http://www.R-project.org/posting-guide.html >> >>and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> >> >>______________________________________________________________________ >>> >> >This e-mail contains information which is confidential. It is >intended >> >only for the use of the named recipient. If you have received this >> >e-mail in error, please let us know by replying to the sender, and >> >immediately delete it from your system. Please note, that in these >> >circumstances, the use, disclosure, distribution or copying of this >> >information is strictly prohibited. KEMRI-Wellcome Trust Programme >> >cannot accept any responsibility for the accuracy or completeness >of >> >this message as it has been transmitted over a public network. >Although >> >the Programme has taken reasonable precautions to ensure no viruses >are >> >present in emails, it cannot accept responsibility for any loss or >> >damage arising from the use of the email or attachments. Any views >> >expressed in this message are those of the individual sender, except >> >where the sender specifically states them to be the views of >> >KEMRI-Wellcome Trust Programme. >> >>______________________________________________________________________ >> >> >> >> >> >______________________________________________________________________ >> >> This e-mail contains information which is confidential. It is >intended >> only for the use of the named recipient. If you have received this >e-mail >> in error, please let us know by replying to the sender, and >immediately >> delete it from your system. Please note, that in these >circumstances, >> the use, disclosure, distribution or copying of this information is >> strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any >> responsibility for the accuracy or completeness of this message as >it >> has been transmitted over a public network. Although the Programme >has >> taken reasonable precautions to ensure no viruses are present in >emails, >> it cannot accept responsibility for any loss or damage arising from >the >> use of the email or attachments. Any views expressed in this message >are >> those of the individual sender, except where the sender specifically >> states them to be the views of KEMRI-Wellcome Trust Programme. >> >______________________________________________________________________ >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >____________________________________________________________ >Can't remember your password? Do you need a strong and secure password? >Use Password manager! It stores your passwords & protects your account. >Check it out at http://mysecurelogon.com/password-manager > > > > > >______________________________________________________________________ > >This e-mail contains information which is confidential. It is intended >only for the use of the named recipient. If you have received this >e-mail in error, please let us know by replying to the sender, and >immediately delete it from your system. Please note, that in these >circumstances, the use, disclosure, distribution or copying of this >information is strictly prohibited. KEMRI-Wellcome Trust Programme >cannot accept any responsibility for the accuracy or completeness of >this message as it has been transmitted over a public network. Although >the Programme has taken reasonable precautions to ensure no viruses are >present in emails, it cannot accept responsibility for any loss or >damage arising from the use of the email or attachments. Any views >expressed in this message are those of the individual sender, except >where the sender specifically states them to be the views of >KEMRI-Wellcome Trust Programme. >______________________________________________________________________ ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.