From: [EMAIL PROTECTED] Operating system: All PHP version: 4.3.0RC3 PHP Bug Type: Scripting Engine problem Bug description: htmlspecialchars() misbehaviour
htmlspecialchars() handles '&' char incorrectly - it doesn't care if it is aready part of entity or not. It results in very "funny" things when this function is being called several times for the same string. For example: echo htmlspecialchars(htmlspecialchars(htmlspecialchars(htmlspecialchars(htmlspecialchars('text & text'))))); will produce: text & text Most correct bahaviour will be to check, if it is followed by any valid entity as they're described in HTML specification. However it can be quite hard to do, because there is lots of entities. So another way is also possible (it should be faster but more dirdy): just check if '&' char is started some abstract entity. Here is 2 regular expressions which are implements correct '&' char handling: 1. This is correct way to handle entities: preg_replace('/\&(?!((#\d{1,5})|(#(x|X)[\dA-Fa-f]{1,4})|[aA]acute|[aA]circ|acute|(ae|AE)lig| [aA]grave|alefsym|[aA]lpha|amp|an[dg]|[aA]ring|asymp|[aA]tilde|[aA]uml| bdquo|[bB]eta|brvbar|bull|cap|[cC]cedil|cedil|cent|[cC]hi|circ|clubs|cong| copy|crarr|cup|curren|[dD]agger|d[aA]rr|deg|[dD]elta|diams|divide|[eE]acute| [eE]circ|[eE]grave|empty|e[mn]sp|[eE]psilon|equiv|[eE]ta|eth|ETH|[eE]uml| euro|exist|fnof|forall|frac1[24]|frac34|frasl|[gG]amma|g[et]|h[aA]rr|hearts| hellip|[iI]acute|[iI]circ|iexcl|[iI]grave|image|infin|int|[iI]ota|iquest| isin|[iI]uml|[kK]appa|[lL]ambda|lang|laquo|l[aA]rr|lceil|ldquo|le|lfloor| lowast|loz|lrm|lsa?quo|lt|macr|mdash|micro|middot|minus|[mM]u|nabla|nbsp| ndash|n[ei]|not(in)?|nsub|[nN]tilde|[nN]u|[oO]acute|[oO]circ|(oe|OE)lig| [oO]grave|oline|[oO]mega|[oO]micron|oplus|or|ord[fm]|[oO]slash|[oO]tilde| otimes|[oO]uml|par[at]|permil|perp|[pP]hi|[pP]i|piv|plusmn|pound|[pP]rime| pro[dp]|[pP]si|quot|radic|rang|raquo|r[aA]rr|rceil|rdquo|real|reg|rfloor| [rR]ho|rlm|rsaquo|rsquo|sbquo|[sS]caron|sdot|sect|shy|[sS]igma|sigmaf|sim| spades|sube?|sum|sup[123e]?|szlig|[tT]au|there4|[tT]heta|thetasym|thinsp| thorn|THORN|tilde|times|trade|[uU]acute|u[aA]rr|[uU]circ|[uU]grave|uml| upsih|[uU]psilon|[uU]uml|weierp|[xX]i|[yY]acute|yen|[yY]uml|[zZ]eta|zwn?j);)/','&',$str); 2. This is less correct, but still better way to handle them: preg_replace('/&(?!(([A-Za-z_:][A-Za-z0-9\.\-_:]*)|(#\d+)|(#(x|X)[\dA-Fa-f]+));)/','&',$str); Good thing about second regexp is that in a case this way will be implemented by htmlspecialchars() function - it will be possible to use it to handle XML entities aswell. -- Edit bug report at http://bugs.php.net/?id=21027&edit=1 -- Try a CVS snapshot: http://bugs.php.net/fix.php?id=21027&r=trysnapshot Fixed in CVS: http://bugs.php.net/fix.php?id=21027&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=21027&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=21027&r=needtrace Try newer version: http://bugs.php.net/fix.php?id=21027&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=21027&r=support Expected behavior: http://bugs.php.net/fix.php?id=21027&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=21027&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=21027&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=21027&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=21027&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=21027&r=dst IIS Stability: http://bugs.php.net/fix.php?id=21027&r=isapi