[R] selecting dataframe columns based on substring of col name(s)

Evan Cooch Wed, 21 Jun 2017 11:39:35 -0700

Suppose I have the following sort of dataframe, where each column namehas a common structure: prefix, followed by a number (for this example,col1, col2, col3 and col4):

d = data.frame( col1=runif(10), col2=runif(10),col3=runif(10),col4=runif(10))

What I haven't been able to suss out is how to efficiently'extract/manipulate/play with' columns from the data frame, making useof this common structure.

Suppose, for example, I want to 'work with' col2, col3, and col4. Now, Icould subset the dataframe d in any number of ways -- for example


piece <- d[,c("col2","col3","col4")]

Works as expected, but for *big* problems (where I might have dozens ->hundreds of columns -- often the case with big design matrices output bysome linear models program or another), having to write them all outusing c("col2","col3",...."colXXXXX") takes a lot of time. What I'mwondering about is if there is a way to simply select over the "changingpart" of the column name (you can do this relatively easily in a datastep in SAS, for example). Heuristically, something like:


piece <- df[,col2:col4]

where the heuristic col2:col4 is interpreted as col2 -> col4 (parse theprefix 'col', and then simply select over the changing suffic -- i.e.,column number).

Now, if I use the "to" function in the lessR package, I can get therefrom here fairly easily:


piece <- d[,to("col",4,from=2,same.size=FALSE)]

But, is there a better way? Beyond 'efficiency' (ease ofimplementation), part of what constitutes 'better' might be something inbase R, rather than relying on a package?


Thanks in advance...

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] selecting dataframe columns based on substring of col name(s)

Reply via email to