Edit report at https://bugs.php.net/bug.php?id=38316&edit=1
ID: 38316 Comment by: reg dot php at alf dot nu Reported by: raymond at rnamusic dot com Summary: html_entity_decode() unexpected results Status: Open Type: Feature/Change Request Package: Feature/Change Request Operating System: Linux PHP Version: 4.4.3 Block user comment: N Private report: N New Comment: The documentation (well, the signature at the top) claims the third argument defaults to UTF-8, this is wrong. You want html_entity_decode($string, ENT_QUOTES, 'UTF-8') Previous Comments: ------------------------------------------------------------------------ [2006-08-03 16:28:33] raymond at rnamusic dot com not your bug submission script translates my example ascii char into an entity, so where you read "é" should be a sigle ascii character. fyi. ------------------------------------------------------------------------ [2006-08-03 16:27:00] raymond at rnamusic dot com Description: ------------ In all example code, and in all php functions, I can not find a simple snipet that will find html enties that are attached to characters (e.g. "é" a unicode construct) and decode them properly (to "é"). The string "Japrisot, Sébastien" is just ignored by html_entity_decode() and returned as is -- nothing changed. The only solution seems to write a custom replacement function, which seems a bit odd since html_entity_decode purports to decode common entities. If you work with marc records, as I do you come across these entities all the time. Reproduce code: --------------- <?php $string = "Japrisot, Sébastien"; $decoded = html_entity_decode($string); echo $decoded; ?> Expected result: ---------------- Japrisot, Sébastien Actual result: -------------- Japrisot, Sébastien ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=38316&edit=1