Re: [R] many datasets run with one R script in a computer cluster

David Winsemius Fri, 08 Oct 2010 11:31:25 -0700


On Oct 8, 2010, at 12:33 PM, Martin Hughes wrote:

Hello Everyone
I have an R script (and a source file which I keep my functions)that I need to run on 70 data sets (each consisting of a pair offiles).
I wish to run these data sets in a computer cluster that is run bymy uni (HOWEVER they cannot help me with this problem but say it isdo-able)
the cluster is clever enough that if i set my data up as follows:within one folder called 'work' there is 70 subfolders each of whichcontain a pair of files, each pair of files having a unique firstpart eg CottonEA05 as in the example text below)
then if I have one R script to run the analysis within the mainfolder, it will open each subfolder, run the R script and output theresults into that subfolder.
The problem is that this script for R needs to have some kind ofwild card element so for example in the script below, R will replaceCottonEA05 with the whatever the unique identifier is for theparticular subfolder its looking through eg change it toMartin_M_STAGE.txt or bananas_M_STAGE.txt etc
Can R do this? ie can it look a file title, and change the file namewithin the script to be the same as that file title, and then runthe analysis

It can certain read a directory and return the file names into avector. And you can certainly do sub() on that vector to strip out theleading characters before the first occurrence of a character.


?list.files

(Which also has pattern matching facilities through its secondargument.)

This reads the files in my working directory and then returns only thecharacters before the first period:


> filist <- list.files()
> str(filist)
 chr [1:295] "_train_1.dat" "~Show.Dot.Files.txt" ...
> first <- sub("\\..+$","", filist)
> str(first)
 chr [1:295] "_train_1" "~Show" "~UCONN" "2001VBTANB" ...

Was that what you were asking?

--
David.


OR do I have to use another programme that does that?

###
m<-read.table("CottonEA05_M_STAGE.txt")
#"CottonEA05" what is different for each dataset


M<-as.matrix(m[,-c(1)])
rownames(M)<-(m[,1])
pa<-read.table("CottonEA05_D_STAGE.txt",header=T)
timetable<-read.table("TimeBinLookup.txt",header=T,sep="\t")
PA<-as.matrix(pa[,-c(1)])
rownames(PA)<-(pa[,1])
OCHAR<-c()

source("DISPARITY.R")
library(calibrate)
###


Thanks
Martin

--
Martin Hughes
MPhil/PhD Research in Biology
Rm 1.07,  4south
University of Bath
Department of Biology and Biochemistry
Claverton
Bath    BA2 7AY
Tel: 01225 385 437
m.hug...@bath.ac.uk
http://www.bath.ac.uk/bio-sci/biodiversity-lab/hughes.html

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] many datasets run with one R script in a computer cluster

Reply via email to