Actually the ".0" on the first variable is not needed. You could modify the reshape() call to search for the base name of each variable so you would not need to change the code if the number of replications changes:
reshape(df5, direction="long", v.names=c("dose", "resp"), varying=list(dose=grepl("dose", names(df5)), resp=grepl("resp", names(df5)) ) ) ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Tuesday, July 23, 2013 1:12 PM To: David Winsemius Cc: R help; Andrea Lamont Subject: Re: [R] flexible approach to subsetting data On Jul 23, 2013, at 10:49 AM, David Winsemius wrote: > > On Jul 23, 2013, at 10:01 AM, Adams, Jean wrote: > >> Check out the reshape() function of the reshape package. Here's one of the >> examples from ?reshape. >> >> Jean >> >> >> library(reshape) # No, at least not for the reshape-function > > The reshape function is from the 'base' package. The 'reshape' and 'reshape2' packages were written (at least in part) because the 'reshape'-function was so difficult to understand. > > If you do choose to use the reshape2 package, which is well-respected and often extremely helpful, the function you will want to start with is 'melt'. > > >> long <- reshape(wide, direction="long") > > I don't think this example will be particularly helpful since the initial direction is "long" (from "wide") and more input would be needed. Here's a dataset to experiment with df5 <- data.frame(dose.0 = c(40,50,60,50),resp.0=c(40,50,60,50), dose.1 = c(1,2,1,2), resp.1=c(1,2,1,2)+3, dose.2 = c(2,1,2,1), resp.2=c(1,2,1,2)+3, dose.3 = c(3,3,3,3), resp.3=c(1,2,1,2)+3 ) Notice that you would need add the ".0" to the column names reshape(df5, direction="long", v.names=c("dose", "resp"), varying=list(dose=c(1,3,5,7), resp=c(2,4,6,8) ) ) # succeeds So perhaps could use similar call (after append the ".0"'s) with: varying=list(sim=seq(1,810,by=4), X1= seq(2,810,by=4), X2= seq(3,810,by=4), X3= seq(4,810,by=4) ) > > >> wide >> long >> >> >> >> On Tue, Jul 23, 2013 at 9:35 AM, Andrea Lamont <alamont...@gmail.com> wrote: >> >>> Hello: >>> >>> I am running a simulation study and am stuck with a subsetting problem. >>> >>> Here is the basic issue: >>> I generated data and am running a simulation that uses multiple imputation. >>> For each generated dataset, I used multiple imputation. The resultant >>> dataset is in wide for where each imputation is recorded as a separate >>> column (though the different simulations are stacked). Here is an example >>> of what it looks like: >>> >>> sim X1 X2 X3 sim.1 X1.1 X1.1 X3.1 > >>> 1 # # # # # # # >>> 1 # # # # # # # >>> 1 # # # # # # # >>> 2 # # # # # # # >>> 2 # # # # # # # >>> 2 # # # # # # # >>> >>> sim refers to the simulated/generated dataset. X1-X3 are the values for the >>> first imputed dataset, X1.1-X3.1 are the values for the second imputed >>> dataset. >>> >>> The problem is that I want the data to be in long format, like this: >>> >>> sim m X1 X2 X3 >>> 1 1 # # # >>> 1 2 # # # >>> 2 1 # # # >>> 2 2 # # # >>> >>> where m is the imputation number. >>> This will allow me to do cleaner calculations (e.g. X3-X1). >>> >>> I know I can subset the data manually - e.g. [,1:10] and save this to >>> separate datasets then rbind; however, I'm looking for a more flexible >>> approach to do this. This manual approach would be quite tedious as number >>> of imputations (and therefore number of columns) increased (with only 10 >>> imputations, there are roughly 810 columns). Also,I would like to >>> avoid having to recode each time I change the number of imputations. >>> >>> THe same is true for the reshape function, which would require naming >>> a huge number of columns and edits each time 'm' changes. > > If the columns are named regularly, then 'reshape' will attempt to split properly without an explicit naming. Details and a better description of the problem might allow more specific answers to emerge. The fact that the first instances have no numeric indicators may be a problem for the algorithm. > > Why not post dput(head( dfrm[ ,1:12])) > > -- > David. > >>> >>> >>> Is there a flexible way to approach this? I'm inclined to use a for loop, >>> but know that 1) this is generally inefficient and 2) am having trouble >>> with >>> the coding regardless. >>> >>> Any suggestions are appreciated. >>> >>> Thanks, >>> Andrea >>> David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.