Don, Excellent advice. I've gone back and done a bit of coding and wanted to see what you think and possibly "shore up" some of the technical stuff I am still having a bit of difficulty with.
I'll past the code I have to date with any important annotations: topdir="~" library(gmodels) setwd(topdir) ### Will probably want to do two for loops as opposed to recursive files=list.files(path=topdir,pattern="Coverage") for (i in files) { dir=paste("~/hangers/",i,sep="") files2=list.files(path=dir,pattern="Length") ### Make an empty matrix that will have the independent variable as the filenum and the dependent variable ### as the mean of the length or should I have two vectors for the regression. Basically the Length_(\d+) is the independent variable (which is taken from the filename) which all the regressions will have and then inside the Length_(\d+) is a 1d set of numbers which I take the mean of which in turn becomes the dependent variable. So in essence the points are: f(length)=mean(length$V1) f(45)=50 f(50)=60 etc ... for (j in files2) { ## I just rearranged the following line but I'm not sure what the command is doing ## I am assuming 'as.numeric' means take the input as a number instead of a string and the gsub has #me stumped filenum=as.numeric(gsub('Length_','',j)) ## Can I assign variables at the top instead of hardcoding? like upper=50 , lower=30? ## And I don't need to put brackets for this if statement do I? Does it basically just ## say that if the filenum is outside those parameters, just go to the next j in files2? if (filenum > 200 | filenum < -10) next dir2=paste("~/hangers",i,j,sep="/") tmp=read.table(dir2) mean(tmp($V1)) Now should I put these in a matrix or a vector (all j values (length vs mean(tmp$V1) for each i iteration) } } I think lastly, Id like to get a print out of each of the regressions (each iteration of i). Is that when I use the summary command? And, like in unix, can I redirect the output to a file? Best Don MacQueen wrote: > > I can't go through all the details, but hopefully this will help get > you started. > > If you look at the help page for the list.files() function, you will see > this: > > list.files(path = ".", pattern = NULL, all.files = FALSE, > full.names = FALSE, recursive = FALSE, > ignore.case = FALSE) > > The "." in path means to start at your current working directory. > Assuming your 5 Coverage directories are subdirectories of your > current working directory, that's what you want. > > Then, setting recursive to TRUE will cause it to also list the > contents of all subdirectories. Since your Length files are in the > Coverage subdirectories, that's what you want. > > Finally, the pattern argument returns only files that match the > pattern, so something like > patter="Length" > should get you just the files you want. > > The result is a character vector containing the names of all your > Length files. Try it and see. > > Then, a simple loop over the over the vector of filenames, with an > appropriate scan() or read.table() command for each, will read the > data in. > > If you need to restrict the files, say Length_20, Length_25, > Length_30, etc. then you'll have to do some more work. > Look at > as.numeric(gsub( 'Length_', '', filename)) > to get just the number part of the filename, as a number, and then > you can use numeric inequalities to identify whether or not any > particular file is to be processed. > > Since you haven't shown what the contents of your files look like > (two columns of numbers or what), I have no idea what to suggest for > the part having to do with reading them in, plotting or doing linear > regression. > > The basic function for linear regression is lm(). > > > Here is a summary: > > files <- list.files( '~' , pattern='Length', recursive=TRUE) > > for (fl in files) { > > ## optional, to restrict to only certain files > filenum <- as.numeric(gsub( 'Length_', '', filename)) > > ## skip to next file if it isn't in the correct number range > if (filenum > 50 | filenum < 20) next > > ## a command to read the current file. perhaps: > ## tmp <- read.table(fl) > > ## commands to do statistics on the data in the current file. perhaps: > ## fit <- lm( y ~ y, data=tmp) > > ## some output > cat('------ file =',fl,'-----\n') > print(fit) > > } > > This example doesn't restrict only to certain Coverage subdirectories. > > -Don > > > > At 9:29 AM -0700 9/15/08, bioinformatics_guy wrote: >>Im very new to R so this might be a very simple question. First I'll lay out >>the hierarchy of my directories, goals. >> >>I have say 5 directories of form "Coverage_(some number)" and each one of >>these I have text files of form "Length_(some number)" which are comprised >>of say 30 numbers. Each one of these Length files (which are basically >>incremented by 5 from 0 to 100, Length_(0,5,10,15,20) are to be averaged >>where the average is the y-value and the length is the x-value in a linear >>regression. >> >>What I want to do is, write a script that looks in each of the coverage >>directories and then reads in each of the files, takes the means, and plots >>them in form I specified above. The catch is, what if I only want to plot >>say Length_(20-50) and what command/method is best for a linear regression? >>I've looked at m1(), but have not gotten it to work properly. >> >>Below is some of the code I've put together: >> >>topdir="~" >> >>setwd(topdir) >> >>### Took this function from a friend so I'm not sure what its doing besides >>grep-ing a directory? >>ll<-function(string) >>{ >> grep(string,dir(),value=T) >>} >> >>### I believe this is looking for all files of form below >>subdir = ll("Coverage_[1-9][0-9]$") >> >>### A for loop iterating through each of the sub directories. >>for (i in subdir) >>{ >> #not sure what this line is doing as I found it on the internet >> on a >>similar function >> setwd(paste(getwd(),i,sep="/")) >> #This makes a vector of all the file names >> filelist=ll("Length_") >> >>Can I use a regex or logic to only take the filelist variables I want? >>And can I now get the mean of each Length_* and set in a matrix (length x >>mean)? >> >>Then finally, how to do a linear regression of this. >> >>-- >>View this message in context: http:// www. >>nabble.com/Scripting-in-R----pattern-matching%2C-logic%2C-system-calls%2C-the-works%21-tp19496451p19496451.html >>Sent from the R help mailing list archive at Nabble.com. >> >>______________________________________________ >>R-help@r-project.org mailing list >>https:// stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > > > -- > -------------------------------------- > Don MacQueen > Environmental Protection Department > Lawrence Livermore National Laboratory > Livermore, CA, USA > 925-423-1062 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Scripting-in-R----pattern-matching%2C-logic%2C-system-calls%2C-the-works%21-tp19496451p19512508.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.