In article <[EMAIL PROTECTED]>,
Robin Garbutt wrote:
> Hi all,
>
> I have a string that is a random sequence like the following:-
>
> ACGTCGTCGTCACACACACGCGTCTCTATACGCG
>
> I want to be able to parse the string, picking out any TATA sequences,
> colour them in red and make a not of where ther lie in the sequence.
>
> Is this possible with perl?
And much more (though not necessarily from me ;-))
Here is my version using a terminal window and with output something like
this (hits in red):
Line# : Char# : Matches
1 : 11 : AGTGTAGAGTTCTTCATTTTTTACGGACGGTCCGACCGCTGGATCTAGAG
1 : 44 : AGTGTAGAGTTCTTCATTTTTTACGGACGGTCCGACCGCTGGATCTAGAG
5 : 7 : CTGTATTCTTGAAAGTCCCCCAGCATCCAGGCCATTATCGAATATCGACT
6 : 3 : TTTCTTGCAAGTTAATGGTAGACCTACAGTTGGGGAACTGAGTATCCCAG
Notice I print same-line multiple hits on separate lines. I suppose the
fancier format would be something like:
Line# : Char# : Matches
1 : 11,44 : AGTGTAGAGTTCTTCATTTTTTACGGACGGTCCGACCGCTGGATCTAGAG
5 : 7 : CTGTATTCTTGAAAGTCCCCCAGCATCCAGGCCATTATCGAATATCGACT
6 : 3 : TTTCTTGCAAGTTAATGGTAGACCTACAGTTGGGGAACTGAGTATCCCAG
I used the substr function (then afterwards remembered that index might be
better/easier for this); I also imagine that the slicker way to do this is
probably with regexes (cue, John Krahn one-liner enters from stage
left...). ;-)
-K (as always, advice, criticism welcome)
#!/usr/bin/perl
use warnings;
use strict;
# find_substring
# I have a string that is a random sequence like the following:-
#
# ACGTCGTCGTCACACACACGCGTCTCTATACGCG
#
# I want to be able to parse the string, picking out any TATA sequences,
# colour them in red and make a not of where ther lie in the sequence.
while (@ARGV) {
my $sequence = 'TATA'; #what we are looking for
my $data = shift;
open FH, "< ", $data
or die "Couldn't open datafile $data for reading: $!\n";
printf "\nLine# : Char# : Matches\n"; # print heading
while (<FH>) {
chomp;
print matches($., $_, $sequence);
}
}
# end main #
# begin sub #
sub matches{
my @matches;
my ($line_nbr, $line, $seq) = @_;
for (0 .. (length($line) - length($seq)) ) {
my $char_position = $_;
my $substring = substr $line, $char_position, length $seq;
if ($substring eq $seq) {
my $hilite_line = hilite($line, $char_position, $seq);
$_++; # add 1 to char position
push @matches, sprintf "%5d : %5d : %s\n", $line_nbr, $_,
$hilite_line;
}
}
return @matches;
}
sub hilite {
my $color_on = "\e[31;1m";
my $color_off = "\e[0m";
my ($line, $char_pos, $seq) = @_;
substr($line, $char_pos, length($seq), "$color_on$seq$color_off");
return $line;
}
## end ##
--
Kevin Pfeiffer
International University Bremen
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]