ID: 49036 Updated by: scott...@php.net Reported By: gvdefence-ncr at yahoo dot it Status: Bogus Bug Type: PCRE related Operating System: WXP PHP Version: 5.2.10 New Comment:
We have Perl Compatible Regular Expressions *NOT* POSIX regular expressions. Previous Comments: ------------------------------------------------------------------------ [2009-07-23 18:54:58] gvdefence-ncr at yahoo dot it To me this only means that also the PHP documentation is wrong. 1st) there is a paradox: if [\w] (I tested same issue of \W) does the matching depending on local setting then also [A-Za-z_] (which is the same of [\w] should behave in the same way and match also accented character like àèìòù depending on local setting, since it does not happen this last one would be the bug. 2nd) I wonder how to acknowledge all websites on the internet (including Wikipedia) that PHP reg expression sintax is different from the common sense standard of the rest of the world! PS I adore PHP, just trying to help. Bye!) ------------------------------------------------------------------------ [2009-07-23 18:14:04] der...@php.net http://uk.php.net/manual/en/regexp.reference.backslash.php clearly explains it: \w any "word" character \W any "non-word" character Each pair of escape sequences partitions the complete set of characters into two disjoint sets. Any given character matches one, and only one, of each pair. A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w. ------------------------------------------------------------------------ [2009-07-23 17:59:28] gvdefence-ncr at yahoo dot it What's locale? [\W] is identical to [^A-Za-z0-9_] is not only Microsoft idea. \W means matching any nonword character is the same of [^\w] which is the same of [^A-Za-z0-9_] [http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes] Also many other website that talks about reg exp say the same thing of Microsoft and wikipedia, you can search in Google. Sorry, but this is a bug! BTW: php manual is completly useless regarding regular expression sintax, it does not help in any way, that's why I added the Microsoft documentation link. ------------------------------------------------------------------------ [2009-07-23 17:41:11] scott...@php.net pcre is locale aware so there are some exceptions. What locale are you using? Also we use PCRE which is not the Microsoft regexp syntax, I suggest you read the PHP manual instead. ------------------------------------------------------------------------ [2009-07-23 17:34:25] gvdefence-ncr at yahoo dot it Description: ------------ According to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can see: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx) But preg_replace does not seem to work the same way. Reproduce code: --------------- <?php //according to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can see: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx) $result1 = preg_replace('/[^A-Za-z0-9_]*/', '', "test àèìòù test"); $result2 = preg_replace('/[\W]*/', '', "test àèìòù test"); echo "<pre>" . $result1 . "</pre>"; //ok, it shows: "testtest" echo "<pre>" . $result2 . "</pre>" //wrong it shows: "testàèìòùtest" ?> Expected result: ---------------- testtest testtest Actual result: -------------- testtest testàèìòùtest ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=49036&edit=1