ID:               49036
 Updated by:       scott...@php.net
 Reported By:      gvdefence-ncr at yahoo dot it
 Status:           Bogus
 Bug Type:         PCRE related
 Operating System: WXP
 PHP Version:      5.2.10
 New Comment:

We have Perl Compatible Regular Expressions *NOT* POSIX regular
expressions.


Previous Comments:
------------------------------------------------------------------------

[2009-07-23 18:54:58] gvdefence-ncr at yahoo dot it

To me this only means that also the PHP documentation is wrong.

1st) there is a paradox:
if [\w] (I tested same issue of \W) does the matching depending on
local setting then also [A-Za-z_] (which is the same of [\w] should
behave in the same way and match also accented character like àèìòù
depending on local setting, since it does not happen this last one would
be the bug.

2nd) I wonder how to acknowledge all websites on the internet
(including Wikipedia) that PHP reg expression sintax is different from
the common sense standard of the rest of the world!



PS
I adore PHP, just trying to help. Bye!)

------------------------------------------------------------------------

[2009-07-23 18:14:04] der...@php.net

http://uk.php.net/manual/en/regexp.reference.backslash.php clearly
explains it:

\w
    any "word" character
\W
    any "non-word" character

Each pair of escape sequences partitions the complete set of characters
into two disjoint sets. Any given character matches one, and only one,
of each pair.

A "word" character is any letter or digit or the underscore character,
that is, any character which can be part of a Perl "word". The
definition of letters and digits is controlled by PCRE's character
tables, and may vary if locale-specific matching is taking place. For
example, in the "fr" (French) locale, some character codes greater than
128 are used for accented letters, and these are matched by \w. 

------------------------------------------------------------------------

[2009-07-23 17:59:28] gvdefence-ncr at yahoo dot it

What's locale?

[\W] is identical to [^A-Za-z0-9_] is not only Microsoft idea.

\W means matching any nonword character is the same of [^\w] which is
the same of [^A-Za-z0-9_]
[http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes]

Also many other website that talks about reg exp say the same thing of
Microsoft and wikipedia, you can search in Google.

Sorry, but this is a bug!


BTW: php manual is completly useless regarding regular expression
sintax, it does not help in any way, that's why I added the Microsoft
documentation link.

------------------------------------------------------------------------

[2009-07-23 17:41:11] scott...@php.net

pcre is locale aware so there are some exceptions. What locale are you
using?

Also we use PCRE which is not the Microsoft regexp syntax, I suggest
you read the PHP manual instead.

------------------------------------------------------------------------

[2009-07-23 17:34:25] gvdefence-ncr at yahoo dot it

Description:
------------
According to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can see:
http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx)

But preg_replace does not seem to work the same way.


Reproduce code:
---------------
<?php
   //according to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can
see: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx)
   
   $result1 = preg_replace('/[^A-Za-z0-9_]*/', '', "test àèìòù test");
   $result2 = preg_replace('/[\W]*/', '', "test àèìòù test");
      
   echo "<pre>" . $result1 . "</pre>"; //ok, it shows: "testtest"
   echo "<pre>" . $result2 . "</pre>" //wrong it shows:
"testàèìòùtest"
?>

Expected result:
----------------
testtest

testtest

Actual result:
--------------
testtest

testàèìòùtest


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=49036&edit=1

Reply via email to