Wijaya Edward am Donnerstag, 27. April 2006 02.51:
> Hi,
> I have two strings that I want to compute the number of mismatches between
> them. These two strings are of the "same" size. Let's call them 'source'
> string and 'target' string. Now, the problem is that the 'source' and
> 'target' string may come in ambiguous form, meaning that in one position
> they may contain more than 1 (upto 4) characters. The ambiguous position is
> marked with square bracketed [ATCG] region. The example is as follows:
>
> Example 1 (where the source is ambiguous):
>
> my $source1 = '[TCG]GGGG[AT]'; # ambiguous
> my $target1 = 'AGGGGC'; # No of mismatch = 2 on position 1 and 6
> my $target2 = 'TGGGGC'; # No of mismatch = 1 on position 6 only
>
>
> Example 2 (where the source is NOT ambiguous):
>
> my $source2 = 'TGGGGT'; # not-ambiguous
> my $target1 = 'AGGGGC'; # No of mismatch = 2 on position 1 and 6
> my $target3 = 'TGGGGT'; # No of mismatch = 0 all position matches
>
>
> Example 3 (where both source and target are ambiguous)
> my $source1 = '[TCG]GGGG[AT]'; # ambiguous
> my $target1 = 'AGGGG[CT]'; # ambiguous, no of mismatch = 1 only
> at position 1
>
> For example I can use bitwise operator to do it.
>
> I have no problem when dealing with Example 1 and 2 above.
> But I'm stuck with example 3, where both source and target is ambiguous.
>
>
> Here is the current snippet I have, which doesn't do the job:
>
> __BEGIN__
> sub mismatches {
> my($source, $target) = @_;
> my @sparts = ($source =~ /(\[.*?\]|.)/g);
> my @tparts = ($target =~ /(\[.*?\]|.)/g);
>
> scalar grep $tparts[$_] !~ /^$sparts[$_]/, 0 .. $#sparts;
> }
> __END__
>
> Where did I go wrong? I humbly seek advice.
Hello Edward
Here is one way to do it.
I didn't test it thorougly, but it demonstrates the alternative aproach of
comparing every position in the source and target:
#!/usr/bin/perl
use strict;
use warnings;
sub mismatches {
my ($source, $target)[EMAIL PROTECTED];
# split source and target into single positions
#
my @spos=$source=~/((?:\[.+?\])|.)/g;
my @tpos=$target=~/((?:\[.+?\])|.)/g;
# debug info
#
warn "source positions: ", (join ',', @spos), "\n";
warn "target positions: ", (join ',', @tpos), "\n";
my $mm=0; # number of mismatches
my @mmp; # mismatch positions
# calculate number of mismatches and their positions
#
do { $mm++, push @mmp,$_ if $spos[$_]!~qr/$tpos[$_]/ } for 0..$#spos;
# debug info
#
warn "$mm mismatch(es) at positions @mmp;\n";
$mm;
}
mismatches('[TCG]GGGG[AT]', 'AGGGGC');
mismatches('[TCG]GGGG[AT]', 'TGGGGC');
mismatches('[TCG]GGGG[AT]', 'AGGGG[CT]');
mismatches('[TCG]GG[CT]G[AT]', 'AGGGG[CT]');
mismatches('[TCG]GG[CT]G[AT]', 'AGGG[AC][CT]');
__END__
source positions: [TCG],G,G,G,G,[AT]
target positions: A,G,G,G,G,C
2 mismatch(es) at positions 0 5;
source positions: [TCG],G,G,G,G,[AT]
target positions: T,G,G,G,G,C
1 mismatch(es) at positions 5;
source positions: [TCG],G,G,G,G,[AT]
target positions: A,G,G,G,G,[CT]
1 mismatch(es) at positions 0;
source positions: [TCG],G,G,[CT],G,[AT]
target positions: A,G,G,G,G,[CT]
2 mismatch(es) at positions 0 3;
source positions: [TCG],G,G,[CT],G,[AT]
target positions: A,G,G,G,[AC],[CT]
3 mismatch(es) at positions 0 3 4;
hope this helps
Dani
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>