Edit report at https://bugs.php.net/bug.php?id=61354&edit=1

 ID:                 61354
 Comment by:         hufeng1987 at gmail dot com
 Reported by:        hufeng1987 at gmail dot com
 Summary:            htmlentities and htmlspecialchars doesn't respect
                     the default_charset
 Status:             Not a bug
 Type:               Bug
 Package:            Strings related
 Operating System:   Linux/Windows/
 PHP Version:        5.4.0
 Block user comment: N
 Private report:     N

 New Comment:

When your project using GB2312 as default charset encoding,  when you upgrade 
to php 5.4,  you will find htmlspecialchars will not working as usual.

if you want them working correctly, you should replace following code with new:

old code:

htmlspecialchars($string);

new code:

htmlspecialchars($string, NULL, 'GB2312');

recoding the full project is a huge work.

especially when the project is old.


Previous Comments:
------------------------------------------------------------------------
[2012-03-12 06:05:54] hufeng1987 at gmail dot com

may be you are right , php 5.4 should have utf-8 as the default encoding. 


but , as production enviroment, this will cause more accident.


why not  php wisely handle default_charset ? that will free us from recoding.

------------------------------------------------------------------------
[2012-03-12 06:04:35] ras...@php.net

What do you mean it is impossible to rewrite old code? In previous versions 
htmlspecialchars() didn't respect the default_charset ini setting either. It 
only 
looks at that setting if you pass an empty string as the encoding. The change 
in 
PHP 5.4 was simply to switch from ISO-8859-1 to UTF8 when you do not specify a 
charset.

------------------------------------------------------------------------
[2012-03-12 05:56:17] hufeng1987 at gmail dot com

if this was not a bug, why this change blocked our old project?


in previous PHP under php 5.4 ,  we could using htmlspecialchars as simple:

htmlspecialchars($string);

and this call should not broken the string. 

but now, under php 5.4, the default encoding change to utf-8. which may broken 
old codes.

it is impossible to rewrite old code ,add charset encoding specified.

------------------------------------------------------------------------
[2012-03-12 05:47:19] ras...@php.net

There is some confusion around this point. The default_charset in your php.ini 
file is meant to be the output encoding. What you specify here is what ends up 
in the HTTP Content-type response header. You should be able to change that 
without messing up your internal runtime encoding which is why setting that 
does 
not automatically change the internal encoding used by 
htmlspecialchars/htmlentities. You can force it to look at it by setting the 
3rd 
arg (the encoding) arg of the htmlspecialchars() call to "" (and empty string). 
This is documented on the http://php.net/htmlspecialchars page. But, like I 
mentioned, you should be able to change your output encoding separately from 
your internal runtime encoding, so we don't suggest doing this. The safest 
approach is to explicitly set your encoding on your htmlspecialchars() calls. 
There times when you get data from sources that have different encodings so two 
htmlspecialchars() calls in the same app may need to use different encodings.

------------------------------------------------------------------------
[2012-03-12 03:03:35] hufeng1987 at gmail dot com

Description:
------------
I am using php 5.4, i got a trouble with htmlspecialchars, htmlentities.

php 5.4 default charset is utf-8.

i thought htmlspecialchars, htmlentities may be using utf-8 as default encoding,

but even i configured default_charset in my php.ini , the htmlspecialchars and 
htmlentities still stupid using utf-8.

this is a bad expirence, my project is a little big, htmlspecialchars using 
every where, almost  3 million called.

i had no chance to specified encoding  by hand.


add encoding to each call of htmlspecialchars and htmlentities not possible, it 
is a huge change for me .


for another solution, why not php let htmlspecialchars using encoding by 
php.ini settings?

is it a better way? is it friendly to users?

sorry for my bad english.

Test script:
---------------
<?php
$string = '<pre><p>我是测试</p></pre>';

echo htmlspecialchars($string);
echo htmlspecialchars($string, NULL, 'GB2312');

Expected result:
----------------
htmlspecialchars should using charset defined by php.ini 

default_charset.



------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=61354&edit=1

Reply via email to