On Mon, Oct 1, 2012 at 5:15 PM, Florian Huber <[email protected]> wrote:
>
> My confusion was complete when I tried
>
> $string =~ /[ACGT]{5}/;
>
> now it matches 5 letters, but this time from the beginning, i.e.: ACGAC.
>I'm trying to extract a DNA sequence out of a larger string, i.e. the string
>is of the following structure:
$string = "/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/"
> But when I do
$string =~ /[ACGT]/;
> it matches only the last letter, i.e. "G". Why doesn't it start at the
> beginning?
$ cat /tmp/g.pl
$string = "/NOTNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/";
## match and replace w/ and 'X'
if ( $string =~ s/([ACGT])/X/ ) {
print "Matched: $1 in $string\n";
}
Macintosh-3:~ afbach$ perl /tmp/g.pl
Matched: T in /NOXNEEDED/*ACGACGGGTTCAAGGCAG*/NOTNEEDED/
The square brackets create a character class "either A or C or G or T"
and take up one position. "*" makes it zero or more, "+" one or more
and "{5}" means "exactly 5" but of any of those.
If I'm understanding you, you want the sequence of [ACGT]s demarked by
non-ACGTs. While your example has the /* ... */ markers (so
if ( $string =~ m#/\*([ACGT]+)\*/# ) {
would work) I doubt that's your data. Is the string you want the
sequence of only ACGTs? This sort of works:
if ( $string =~ m/[^ACGT]([ACGT]+)[^AGCT]/ ) {
print "Matched: $1 in $string\n";
}
but ....
--
a
Andy Bach,
[email protected]
608 658-1890 cell
608 261-5738 wk
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/