> On Aug 25, 2015, at 10:17 AM, Sam Albers <tonightstheni...@gmail.com> wrote: > > Hi all, > > This is a process question. How do folks efficiently identify column > numbers in a dataframe without manually counting them. For example, if I > want to choose columns from the iris dataframe I know of two options. I can > do this: > >> str(iris)'data.frame': 150 obs. of 5 variables: > $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... > $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... > $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... > $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... > $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 > 1 1 1 1 1 1 ... > > or this: > >> names(iris)[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" >> "Species" > > Neither option explicitly identifies the column number so that I can > do something like this: > > iris[,c(2,4)] > > I feel like there must be a better way to do this so I wanted to ask > the collective wisdom here what people do to accomplish this. > Obviously this is a trivial example, but the issue really becomes > problematic when you have a large dataframe. > > Thanks in advance! > > Sam
Just use ?subset: NewDF <- subset(iris, select = c(Sepal.Width, Petal.Width)) which is the same as: NewDF <- iris[, c(2, 4)] You can also define sequential columns using “:”, thus: NewDF <- subset(iris, select = c(Sepal.Width:Petal.Width) is the same as: NewDF <- iris[, 2:4] and use combinations of the two approaches as well. You can also negate the selection by using: select = -c(…) That avoids having to worry about using integer indices. Regards, Marc Schwartz ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.