Edit report at https://bugs.php.net/bug.php?id=51563&edit=1
ID: 51563 Comment by: jmichae3 at yahoo dot com Reported by: zdenis at free dot fr Summary: Incorrect result Status: Assigned Type: Bug Package: mbstring related Operating System: Windows PHP Version: 5.3.2 Assigned To: moriyoshi Block user comment: N Private report: N New Comment: I am getting russian spam in my email forms. mb_detect_encoding() on my form mail content string shows as ASCII strangely enough! the characters are around the UNICODE Ѐ range. this prevents me from detecting foreign language characters in my form mail. please fix. my code is //detect foreign languages $arr[0] = "ASCII"; $arr[1] = "US-ASCII"; if (false===mb_detect_encoding($comment,$arr,true)) { echo "<div style='color:red;'>ERRORB:".mb_detect_encoding($comment)."</div>"; return true; //error } and using the string I generated from charmap ÐÏÐγÏÐÐÐЫЫÐÏÐÐÑмдп I get ASCII for a result from that last mb_detect_encoding($comment) Previous Comments: ------------------------------------------------------------------------ [2010-04-15 16:06:23] zdenis at free dot fr Description: ------------ When using mb_detect_encoding, depending on how many é characters - or any character above 127 - are present in the string, the detected charset is not consistent and then sometimes wrong. Test script: --------------- // little example php -r "echo mb_detect_encoding(\"é\", 'UTF-8,ISO-8859-1');" php -r "echo mb_detect_encoding(\"éé\", 'UTF-8,ISO-8859-1');" // real life example php -r "echo mb_detect_encoding(\"Produit commandé\", 'UTF-8,ISO-8859-1');" php -r "echo mb_detect_encoding(\"Société\", 'UTF-8,ISO-8859-1');" Expected result: ---------------- ISO-8859-1 ISO-8859-1 ISO-8859-1 ISO-8859-1 Actual result: -------------- UTF-8 ISO-8859-1 UTF-8 ISO-8859-1 ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=51563&edit=1