Well this is the final code I put together with everyones help from this
group:
#!/usr/bin/perl
use warnings;
use strict;
print "Enter the path of the INFILE to be processed:\n";
chomp (my $infile = <STDIN>);
open(INFILE, $infile)
or die "Can't open INFILE for input: $!";
print "Enter in the path of the OUTFILE:\n";
chomp (my $outfile = <STDIN>);
open(OUTFILE, ">$outfile")
or die "Can't open OUTFILE for input: $!";
print "Enter in the LENGTH you want the sequence to be:\n";
chomp (my $len = <STDIN>);
my ($name, @seq);
while ( <INFILE> ) {
chomp;
unless ( /^\s*$/ or s/^\s*>(.+)// ) {
$name = $1;
my @char = ( split( // ), ( '-' ) x ( $len - length ) );
push @seq, ' '."@char $name";
}
}
{
local $" ="\n";
print OUTFILE "R 1 [EMAIL PROTECTED]"; # The top of the file is
supposed
}
close INFILE;
close OUTFILE;
Basically it will take this file:
>dog
atcgc
>cat
atcgctac
>mouse
agctata
and turn it into this:
R 1 10
a t c g c - - - - - dog
a t c g c t a c - - cat
a g c t a t a - - - mouse
However, I forgot that sometime the imput data is like this:
>dog
agatgtagt
agtggttga
agggagc
>cat
gcatcgatg
agcatatgc
>mouse
actagcatc
acgtacgat
That is the sequence of letters can span multiple lines. I would like
the above script to handle input data that can possibly span several
lines as well as those that do not. and output as mentioned above.
You all have been much help! I have really learned a lot with the help
you've given so far!
-Thanks!
-Mike
In article <[EMAIL PROTECTED]>,
[EMAIL PROTECTED] (David Wall) wrote:
> --On Monday, August 25, 2003 6:50 PM -0400 Mike Robeson
> <[EMAIL PROTECTED]> wrote:
>
> > OK, I feel like an idiot. When I initially asked for help with this I
> > just realized that I forgot two little details. I was supposed to add
> > the number of sequences as well as the length of the sequences at the
> > top of the output file.
> >
> > That is this file:
> >
> >> dog
> > agatagatcgcatcga
> >> cat
> > acgcttcgatacgctagctta
> >> mouse
> > agatatacgggtt
> >
> > is relly supposed to be:
> >
> > 3 22
> > a g a t a g a t c g c a t c g a - - - - - - dog
> > a c g c t t c g a t a c g c t a g c t t a - cat
> > a g a t a t a c g g g t t - - - - - - - - - mouse
> >
> > The '3' represents the number of individual sequences in the file (i.e.
> > dog, cat, mouse). And the 22 is the number of letters and dashes there
> > are. The length is already in the script as $len. I am able to get the
> > length listed at the top. However, I cannot find a way to have the
> > number of sequences (the 3 in this case) printed to the top.
>
> Here's one way (slightly altering John's solution), but it will use lots of
> memory if the sequences are long.
>
>
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> my ($name, $num_seq, @seq);
> my $len = 30;
> while ( <DATA> ) {
> unless ( /^\s*$/ or s/^\s*>(\S+)// ) {
> my $name = $1;
> my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ];
> push @seq, "@char $name";
> $num_seq++;
> }
> }
> {
> local $" ="\n";
> print "[EMAIL PROTECTED]";
> }
>
> __DATA__
> > dog
> agatagatcgcatcga
> > cat
> acgcttcgatacgctagctta
> > mouse
> agatatacgggt
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]