Re: [R] variable substitution in for loops

Uwe Ligges Mon, 01 Mar 2010 09:22:06 -0800


On 01.03.2010 14:27, Jon Erik Ween wrote:

Friends

First, thanks to all for great feed-back. Open-source rocks! I have a workable 
solution to my question, attached below in case it might be of any use to 
anyone. I'm sure there are more elegant ways of doing this, so any further 
feedback is welcome!

Things I've learned (for other noobs like me to learn from):

1) dataset[[j]] seems equivalent to dataset$var if j<-var, though quotes can mess 
you up, hence j<-noquote(varlist[i]) in the script (it also makes a difference 
that variables in varlist be stored as a space-separated string. tab- or 
line-break-separated lists don't seem to work, though a different method might handle 
that)

dataset[["var"]] is "equivalent" to dataset$var given var does notcontain any special characters. Otherwise j == "var" has to be TRUE.

2) Loops will abort if they encounter an error (like ROCR encountering a 
prediction that is singular). Error handling can be built in, but is a little 
tricky. I reduplicated the method with a function to test and advance the loop 
on failure. You can suppress error messages if you like)


Not tricky, just use try().

3) Some stats methods don't have NA handling built into them (eg: "prediction" 
in ROCR chokes if there are empty cells in the variables) hence it seems a good idea to 
strip these out before starting. The subsetting with na.omit does this



... given you know what you are doing (and omitting).

4) You reference pieces (slots) of results (S3/S4 objects) by using obj...@slot.


The @ operator is defined for slots of *S4* classes.


Best,
Uwe Ligges

> Hence, you pull out the the auc value in ROCR-"performance" byp...@y.value in the script. you can see what slots are in an object bysimply listing the object contents at the command line>object.


Thanks again for all the help!

Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
     University of Toronto Faculty of Medicine


...code




################################################################################
## R script for automating stats crunching in large datasets                  ##
## Needs space separated list of variable names matching dataset column names ##
## You have to tinker with the code to customize for your application         ##
##                                                                              
                                          ##
## Jon Erik Ween MD, MSc,  26 Feb 2010                                        ##
################################################################################

library(ROCR) # Load stats package to use if not standard
varslist<-scan("/Users/jween/Desktop/INCASvars.txt","list") # Read variable list
results<-as.data.frame(array(,c(3,length(varslist)))) # Initialize results 
array, one type of stat at a time for now

for (i in 1:length(varslist)){ # Loop throught the variables you want to 
process. Determined by varslist
        j<-noquote(varslist[i])
        vars<-c(varslist[i],"Issue_class") # Variables to be analyzed
        temp<-na.omit(incas[vars]) # Have to subset to get rid of NA values 
causing ROCR to choke
        n<-nrow(temp) # Record how many cases the analysis ios based on. Need 
to figure out how to calc cases/controls
        #.table<-table(temp$SubjClass)  # Maybe for later figure out 
cases/controls
        results[1,i]<-j # Name particular results column
        results[2,i]<-n # Number of subjects in analysis
        test<-try(aucval(i,j),silent=TRUE) # Error handling in case procedure 
craps oust so loop can continue. Supress annoying error messages
        if(class(test)=="try-error") next else # Run procedure only if OK, 
otherwise skip
        pred<-prediction(incas[[j]],incas$Issue_class); # Procedure
        perf<-performance(pred,"auc");
        results[3,i]<-as.numeric(p...@y.values) # Enter result into appropriate 
row
}
write.table(results,"/Users/jween/Desktop/IncasRres_ 
Issue_class.csv",sep=",",col.names=FALSE,row.names=FALSE) # Write results to table
rm(aucval,i,n,temp,vars,results,test,pred,perf,j,varslist) # Clean up

aucval<-function(i,j){ # Function to trap errors. Should be the same as real 
procedure above
        pred<-prediction(incas[[j]],incas$Issue_class) # Don't put any real 
results here, they don't seem to be passed back
        perf<-performance(pred,"auc")
}







...end

On 2010-02-24, at 9:19 PM, Dennis Murphy wrote:

Hi:

The plyr package may come in handy here, as it allows you to create functions
based on the variables (and their names) in the data frame. Here's a simple,
cooked-up example that shows a couple of ways to handle this class of problem:

(1) Create three simple data frames with the same set of variables, 
coincidentally
      in the same order, although that shouldn't really matter since we're 
referencing
      by name rather than position:

library(plyr)
a<- data.frame(x = sample(1:50, 10, replace = TRUE),
                 y = rpois(10, 30),
                 z = rnorm(10, 15, 5))
b<- data.frame(x = sample(1:50, 10, replace = TRUE),
                 y = rpois(10, 30),
                 z = rnorm(10, 15, 5))
d<- data.frame(x = sample(1:50, 10, replace = TRUE),
                 y = rpois(10, 30),
                 z = rnorm(10, 15, 5))

(2)  rbind the three data frames and assign an indicator to differentiate the
       individual data frames:

dd<- rbind(a, b, d)
dd$df<- rep(letters[c(1, 2, 4)], each = 10)

(3) Use the ddply() function: .(df) refers to the grouping variable, summarise
      indicates that we want to compute a groupwise summary, and the
      remaining code defines the desired summaries (by variable name).

ddply(dd, .(df), summarise, avgx = mean(x), avgz = mean(z))
   df avgx     avgz
1  a 28.3 17.27372
2  b 28.0 14.32962
3  d 20.3 13.26147

(4) If we create a list of data frames instead, we can accomplish the same
     task by using ldply() [list to data frame as the first two characters] 
instead.
     Since we have a list as input, there's no need for a group indicator as the
     list components comprise the 'groups'.

l<- list(a, b, d)
ldply(l, summarise, avgx = mean(x), avgz = mean(z))

   avgx     avgz
1 28.3 17.27372
2 28.0 14.32962
3 20.3 13.26147

These represent two ways that you can produce summaries by variable name
for multiple data frames. The rbind construct works if all of the data frames 
have
the same variables in the same order; if not, the list approach in (4) is 
better.
To see this,

e<- data.frame(y = rpois(10, 30), z = rnorm(10, 15, 5),
                  x = sample(1:50, 10, replace =TRUE))
l<- list(a, b, d, e)
ldply(l, summarise, avgx = mean(x), avgz = mean(z))
   avgx     avgz
1 28.3 17.27372
2 28.0 14.32962
3 20.3 13.26147
4 29.9 13.64617

plyr is not the only package you could use for this. The doBy package with
function summaryBy() would also work, and you could also use the aggregate()
function. The advantage of plyr and doBy is that the code is a bit tighter and
easier to understand.



On Wed, Feb 24, 2010 at 5:18 PM, Jon Erik Ween<jw...@klaru-baycrest.on.ca>  
wrote:
Friends

I can't quite find a direct answer to this question from the lists, so here 
goes:

I have several dataframes, 200+ columns 2000+ rows. I wish to script some operations to 
perform on some of the variables (columns) in the data frames not knowing what the column 
number is, hence have to refer by name. I have variable names in a text file 
"varlist". So, something like this:

for (i in 1:length(varlist)){
        j<-varlist[i]
        v<-mean(Dataset[[j]])
        print(v)
}

When you think of writing code like this, you should think "apply family". R 
performs
vectorized operations, and you'll become more efficient when you start thinking 
about
how to vectorize rather than how to loop...


Now, if I force it

j<-"Var1"
v<-mean(Dataset[[j]])
print(v)

then it works, but not if i read the varlist as above.

Looking at "j" I get:

print(j)

V1
1 Var1

Hence there is a lot of other stuff read into "j" that confuses "mean". I can't
figure out how to just get the value of the variable and nothing else. I've tried space separated,
comma separated, tab separated lists and all give the same error. I've tried get(), parse()... no
go.

Any suggestions?

Thanks a lot

Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
University of Toronto Faculty of Medicine

Kimel Family Building, 6th Floor, Room 644
Baycrest Centre
3560 Bathurst Street
Toronto, Ontario M6A 2E1
Canada

Phone: 416-785-2500 x3648
Fax: 416-785-2484
Email: jw...@klaru-baycrest.on.ca

Confidential: This communication and any attachment(s) may contain confidential
or privileged information and is intended solely for the address(es) or the
entity representing the recipient(s). If you have received this information in
error, you are hereby advised to destroy the document and any attachment(s),
make no copies of same and inform the sender immediately of the error. Any
unauthorized use or disclosure of this information is strictly prohibited.

[[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] variable substitution in for loops

Reply via email to