> On Jun 21, 2017, at 9:11 AM, Evan Cooch <evan.co...@gmail.com> wrote: > > Suppose I have the following sort of dataframe, where each column name has a > common structure: prefix, followed by a number (for this example, col1, col2, > col3 and col4): > > d = data.frame( col1=runif(10), col2=runif(10), col3=runif(10),col4=runif(10)) > > What I haven't been able to suss out is how to efficiently > 'extract/manipulate/play with' columns from the data frame, making use of > this common structure. > > Suppose, for example, I want to 'work with' col2, col3, and col4. Now, I > could subset the dataframe d in any number of ways -- for example > > piece <- d[,c("col2","col3","col4")] > > Works as expected, but for *big* problems (where I might have dozens -> > hundreds of columns -- often the case with big design matrices output by some > linear models program or another), having to write them all out using > c("col2","col3",...."colXXXXX") takes a lot of time. What I'm wondering about > is if there is a way to simply select over the "changing part" of the column > name (you can do this relatively easily in a data step in SAS, for example). > Heuristically, something like: > > piece <- df[,col2:col4] > > where the heuristic col2:col4 is interpreted as col2 -> col4 (parse the > prefix 'col', and then simply select over the changing suffic -- i.e., column > number). > > Now, if I use the "to" function in the lessR package, I can get there from > here fairly easily: > > piece <- d[,to("col",4,from=2,same.size=FALSE)] > > But, is there a better way? Beyond 'efficiency' (ease of implementation), > part of what constitutes 'better' might be something in base R, rather than > relying on a package?
After staring at the code for the base function subset with a thought to hacking it to do this I realized that should be already part of the evaluation result from its current form: names(airquality) #[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day" subset(airquality, Temp > 90, # this is the row selection select = Ozone:Solar.R) # and this selects columns #-------- Ozone Solar.R 42 NA 259 43 NA 250 69 97 267 70 97 272 75 NA 291 102 NA 222 120 76 203 121 118 225 122 84 237 123 85 188 124 96 167 125 78 197 126 73 183 127 91 189 Bert's advice to work with the numbers is good, but conversion to numeric designations of columns inside the `select`-expression is actually what is occurring inside `subset`. -- David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.