Hi, Try this: lines1<- readLines("file1.txt") lines1<- lines1[lines1!=""] #In "file2.txt",
>or1|1234 ATCGGATTCAGG >or2|347 GAACCTATCGGGGGGGGAATTTA TATATTTTA###this should be a single line >or3|56 ATCGGAGATATAACCAATC >or3|23 AAAATTAACAAGAGAATAGACAAAAAAA >or4|793 ATCTCTCTCCTCTCTCTCTAAAAA >or7|123456789 ACGTGTGTACCCCC #So, I modified the file manually so that it looks like: >or1|1234 ATCGGATTCAGG >or2|347 GAACCTATCGGGGGGGGAATTTATATATTTTA >or3|56 ATCGGAGATATAACCAATC >or3|23 AAAATTAACAAGAGAATAGACAAAAAAA >or4|793 ATCTCTCTCCTCTCTCTCTAAAAA >or7|123456789 ACGTGTGTACCCCC #and saved. If you have many lines showing the above mentioned anomaly, then let me know. #I created a new line after the last line (by using the `Enter` key) in the file to suppress the warnings() which I removed below. lines2<- readLines("file2.txt") lines2<- lines2[lines2!=""] lines2New<-unlist(lapply(split(lines2,(seq_along(lines2)-1)%/%2+1),function(x) paste(x,collapse="\n")),use.names=FALSE) ##here changed because it was tab limited. res<-lapply(lines1,function(x) {x1<- strsplit(x,"\t")[[1]]; x1New<-x1[-1];x2<- gsub(">(.*)\\n.*","\\1",lines2New);lines3<-lines2New[match(x1New,x2)];write.table(lines3,paste0(x1[1],".txt"),row.names=FALSE,quote=FALSE)}) I didn't had any problems in the output. It looks like below: gene1.txt x >or1|1234 ATCGGATTCAGG >or3|56 ATCGGAGATATAACCAATC >or4|793 ATCTCTCTCCTCTCTCTCTAAAAA A.K. Hi.. Thanks Arun, three output files are generated, but they show x and NA,, may be I have to check the input... and could u plz modify the script so that it will take direct input from files? I have attached the two input files.. ----- Original Message ----- From: arun <smartpink...@yahoo.com> To: Utpal Bakshi <utpalm...@gmail.com> Cc: R help <r-help@r-project.org> Sent: Tuesday, June 11, 2013 2:52 PM Subject: Re: Help needed in feature extraction from two input files Hi, Try this: lines1<- readLines(textConnection("gene1 or1|1234 or3|56 or4|793 gene4 or2|347 gene5 or3|23 or7|123456789")) lines2<-readLines(textConnection(">or1|1234 ATCGGATTCAGG >or2|347 GAACCTATCGGGGGGGGAATTTATATATTTTA >or3|56 ATCGGAGATATAACCAATC >or3|23 AAAATTAACAAGAGAATAGACAAAAAAA >or4|793 ATCTCTCTCCTCTCTCTCTAAAAA >or7|123456789 ACGTGTGTACCCCC")) lines2New<-unlist(lapply(split(lines2,(seq_along(lines2)-1)%/%2+1),function(x) paste(x,collapse="\n")),use.names=FALSE) res<-lapply(lines1,function(x) {x1<- strsplit(x," ")[[1]]; x1New<-x1[-1];x2<- gsub(">(.*)\\n.*","\\1",lines2New);lines3<-lines2New[match(x1New,x2)];write.table(lines3,paste0(x1[1],".txt"),row.names=FALSE,quote=FALSE)}) Attached is one of the files generated by the code. A.K. Hi all, I have two input files. First file (file1.txt) contains entries in the following tab delimited format: gene1 or1|1234 or3|56 or4|793 gene4 or2|347 gene5 or3|23 or7|123456789 ....... .. The second file (file2.txt) contains some additional features along with the header line of the first file, such as: >or1|1234 ATCGGATTCAGG >or2|347 GAACCTATCGGGGGGGGAATTTA TATATTTTA >or3|56 ATCGGAGATATAACCAATC >or3|23 AAAATTAACAAGAGAATAGACAAAAAAA >or4|793 ATCTCTCTCCTCTCTCTCTAAAAA >or7|123456789 ACGTGTGTACCCCC .... .. >From these two files, I want to extract entries by row wise header matching and rename the output file as the first column in file1. For example, in the above case, 3 output files will generate. the first output file would named as "gene1.txt" and it contains: >or1|1234 ATCGGATTCAGG >or3|56 ATCGGAGATATAACCAATC >or4|793 ATCTCTCTCCTCTCTCTCTAAAAA the second output file would named as "gene4.txt" and it contains: >or2|347 GAACCTATCGGGGGGGGAATTTATATATTTTA the third output file would named as "gene5.txt" and it contains: >or3|23 AAAATTAACAAGAGAATAGACAAAAAAA >or7|123456789 ACGTGTGTACCCCC Any help in solving the problem is highly appreciated. Thanks in advance. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.