On 24.10.2011 23:10, Debs Majumdar wrote:
Thanks Uwe. This works perfectly.

#######


owd<- setwd(pth)
fls<- list.files(pattern="^chr")
ufls<- unique(sapply(strsplit(fls, "_"), "[", 1))
for(i in ufls){
      of<- strsplit(i, "\\.")[[1]]
      of<- paste(of[1], tail(of, 1), sep=".")
      impute2databel(genofile = i,
                     samplefile = paste(i, "info", sep="_"),
                     outfile = of,
                     makeprob=TRUE, old=FALSE)
}
setwd(owd)

####


I have a question regarding how strsplit works.

When my files are the following:

         chr1.one.phased.impute2.chunk1
         chr1.one.phased.impute2.chunk1_info
         chr1.one.phased.impute2.chunk1_info_by_sample
         chr1.one.phased.impute2.chunk1_summary
         chr1.one.phased.impute2.chunk1_warnings
ufls<- unique(sapply(strsplit(fls, "_"), "[", 1))

This works like a charm.

I have another dataset where the files are


         study1_chr1.one.phased.impute2.chunk1
         study1_chr1.one.phased.impute2.chunk1_info
         study1_chr1.one.phased.impute2.chunk1_info_by_sample
         study1_chr1.one.phased.impute2.chunk1_summary
         study1_chr1.one.phased.impute2.chunk1_warnings

... and so on.

and I wanted to run the same loop but I was unable to change strsplit so that 
it will work when the files are names ads above:

I tried

ufls<- unique(sapply(strsplit(fls, "_"), "[", 2))


unique(gsub("(_.*)_.*", "\\1", x))

Should do if there is a first underscore.

Uwe Ligges



but this knocks off "study1" (modified code below).  What modification do I 
need to make to make this run:

####

fls<- list.files(pattern="study1_chr")
ufls<- unique(sapply(strsplit(fls, "_"), "[", 2))

library(GenABEL)

for(i in ufls){
      of<- strsplit(i, "\\.")[[1]]
      of<- paste(of[1], tail(of, 1), sep=".")
      impute2databel(genofile = i,
                     samplefile = paste(i, "info", sep="_"),
                     outfile = of,
                     makeprob=TRUE, old=FALSE)

}

#####

Thanks,

  Debs


----- Original Message -----
From: Debs Majumdar<debs_st...@yahoo.com>
To: "r-help@r-project.org"<r-help@r-project.org>
Cc:
Sent: Friday, October 21, 2011 2:32 PM
Subject: Reading in and modifying multiple datasets in a loop



Hi,

   I have been given a set of around 300 files where there are 5 files 
corresponding to each chunk.

E.g. Chunk 1 for chr1 contains these 5 files:

         chr1.one.phased.impute2.chunk1
         chr1.one.phased.impute2.chunk1_info
         chr1.one.phased.impute2.chunk1_info_by_sample
         chr1.one.phased.impute2.chunk1_summary
         chr1.one.phased.impute2.chunk1_warnings

For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 
23 chunks.

I am using the DatABEL package to  convert them databel format using the 
following command:


impute2databel(genofile="chr1.one.phased.impute2.chunk1", 
samplefile="chr1.one.phased.impute2.chunk1_info", outfile="chr1.chunk1", makeprob=TRUE, 
old=FALSE)

which uses two files per chunk.


Is there a way I can automate this so that the code goes through each chunk of 
each chromosome and does the conversion to databel format.


Thanks,

  -Debs

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to