Coming from an Excel background, copying and pasting seems attractive, but it 
does not create a reproducible record of what you did so it becomes quite 
tiring and frustrating after some time has passed and you return to your 
analysis. 

Nitpick: you put the setdiff function in the row selection position, an error I 
am sure Hadley did not recommend. 

Since R is programmable, there are far more ways to select columns than just 
setdiff. Since your description of desired features is vague, you are unlikely 
to get the answer you would really like from your email. Some possibilities to 
think about:

a) use regular expressions and grep or grepl to select by similar character 
patterns. E.g. all columns including the the substring "value" or "key": grep( 
"key|value", names( dta ). Possible to specify very complex selection patterns, 
but there are whole books on regular expressions, so you can't expect to learn 
all about them on this R-specific mailing list. 

b) use a separate csv file with a column listing each column name, and then one 
column for each subset you want to define, using TRUE/FALSE values to include 
or not include the column name identified. E.g.

# typically easier to manage in an external data file, online for example only
colsets <- read.csv( text=
"Colname,set1,set2
key,TRUE,TRUE
value1,TRUE,FALSE
value2,TRUE,FALSE
factor1,FALSE,TRUE
",header=TRUE,as.is=TRUE)
dta[ , colsets$set1 ]

Also your criteria of "clean listing" and "copy-pasteable" are likely mutually 
exclusive, depending how you interpret them. You might be able to use dput to 
export a set of column names that can be re-imported accurately, but you might 
not regard it as "clean" if you are thinking "readable".
-- 
Sent from my phone. Please excuse my brevity.

On April 23, 2017 12:07:19 PM PDT, Bruce Ratner PhD <b...@dmstat1.com> wrote:
>R-helpers:
>I'm reading "Advanced R" (Wickham), which provides his way, quoted
>below, of keeping variables. This cherry-picking approach clearly is
>not practical with a large dataset. 
>
>"If you know the columns you don’t want, use set operations to work out
>which colums to keep: df[setdiff(names(df), "z")]"
>
>I'm looking for a way of producing an output of 1000 plus variables,
>such that I can get a clean listing of variables, not like from st(),
>that are easily copy-pastable for selecting the variables I want to
>keep. 
>
>Any suggestion is appreciated.
>Thanks. 
>Bruce
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to