On Mon, Jun 28, 2010 at 19:48, John W. Krahn <[email protected]> wrote:
snip
> s/\((\d+)\)/(<a href="mypage.php?$1">$1</a>)/g;
snip
Since Perl 5.8.0, \d does not mean [0-9], it means any character that
is classified as a digit in Unicode. In Perl 5.12.1, there are five
hundred seventy-seven characters that will match \d. If it is your
intent to replace "᠔᠒" with "<a href="mypage.php?᠔᠒">᠔᠒</a>" then \d
is a good choice; however, if you want to replace digits you can do
math with, I would suggest using [0-9].
Note, Mongolian isn't the only problem, there is also "𝟺𝟸" which
looks like "42", but is really "\x{1d7fa}\x{1d7f8}". If you want both
"\x{1d7fa}\x{1d7f8}" and "42" to point to the same page, you will need
to use some form of transliteration like [Unicode::Digits][1]:
use Unicode::Digits qw/digits_to_int/;
s{
\( (\d+) \)
}{
sprintf "(<a href="mypage.php?%d">%s</a>)", digits_to_int($1), $1
}xeg
[1] : http://search.cpan.org/dist/Unicode-Digits/lib/Unicode/Digits.pm
--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/