Edit report at https://bugs.php.net/bug.php?id=43148&edit=1
ID: 43148 Comment by: sebastian dot mayer at maysoft dot de Reported by: banu_daniel1 at yahoo dot com Summary: filesize and unicode filenames Status: Not a bug Type: Bug Package: Filesystem function related Operating System: windows xp 32 bits PHP Version: 5.2.4 Block user comment: N Private report: N New Comment: Hallo, running php 5.3.5 on windows I have a Problem with the degree char "°". Scandir or opendir, readdir retrieves an entry called "Up-wards at 45°.mp3". If I want to get filesize or filetime message "filesize(): stat failed" comes up. All other characters (for example german Umlaute ä, ü ...) don't have this Problem. Previous Comments: ------------------------------------------------------------------------ [2010-11-10 18:11:37] paj...@php.net Yes, and I'm working on this change, it will accept UTF-8 as input just like what we do on Unices/POSIX systems. ------------------------------------------------------------------------ [2010-11-10 17:33:14] anton85s at mail dot ru "it just passes the filename to the OSes filesystem func and if it fails - we can do nothing about it." but it doesn't pass the filename to the unicode version of the filesystem function, right ? It means that php could be modifed to use the correct filesystem function at least, not non-unicode ones for all calls. ------------------------------------------------------------------------ [2007-11-12 10:03:04] tony2...@php.net PHP doesn't care if it's Unicode or not, it just passes the filename to the OSes filesystem func and if it fails - we can do nothing about it. ------------------------------------------------------------------------ [2007-11-02 17:48:17] carsten_sttgt at gmx dot de > but the problem is still there even on windows xp > so this is the problem filesize function dose not > work with filenames with unicode characters. Ok, after some more tests, I can reproduce this problem. Just look at this shell log: | D:\>cd D:\Apache2.2\htdocs\test\αβγδεζηθ | | D:\Apache2.2\htdocs\test\αβγδεζηθ>dir /b | index.html | phpinfo.php | | D:\Apache2.2\htdocs\test\αβγδεζηθ>type index.html | <html><body><h1>It works!</h1></body></html> | D:\Apache2.2\htdocs\test\αβγδεζηθ>type phpinfo.php | <?php phpinfo(); ?> | | D:\Apache2.2\htdocs\test\αβγδεζηθ>pear-request http://localhost/ | test/%ce%b1%ce%b2%ce%b3%ce%b4%ce%b5%ce%b6%ce%b7%ce%b8/index.html | <html><body><h1>It works!</h1></body></html> | D:\Apache2.2\htdocs\test\αβγδεζηθ>php -r "echo getcwd();" | D:\Apache2.2\htdocs\test\aß?de??? | D:\Apache2.2\htdocs\test\αβγδεζηθ>cd.. | | D:\Apache2.2\htdocs\test>php -r "var_dump(stat('αβγδεζηθ'));" | | Warning: stat(): stat failed for aß?de??? in Command line code on | line 1 | bool(false) | | D:\Apache2.2\htdocs\test> As you can see, I can't execute a PHP script in this folder ("αβγδεζηθ") or use the PHP filesystem functions with this path. But I can access this folder correctly with Apache via HTTP. > on linux version i don't have this problem. That's the difference. On Linux (or PHP) you have only UTF-8. But Windows is using UTF-16 (or the current codepage for the installed locale). Just look at this script "test.php" (encoded in UTF-8): | <?php | mkdir('αβγδεζηθ'); | var_dump(is_dir('αβγδεζηθ')); | ?> and the shell log: | D:\Apache2.2\htdocs\test>php test.php | bool(true) | | D:\Apache2.2\htdocs\test>dir /b | test.php | αβγδεζηθ | | D:\Apache2.2\htdocs\test> As you can see, you can create and access such paths with such a name with PHP, but only inside PHP. In Windows or Apache you must use an other (wrong) name. In this case PHP is just using the byte sequence of UTF-8 chars as Latin1 chars. This can be a quick fix for you, but is indeed not correct. The problem is, PHP is only using simple string and filesystem functions in the c sources, which are only working with the current locale codepage. But it is not using the wide char and filesystem functions from the Windows SDK, like Apache did. BTW: With a current PHP6 snap (full unicode support?), this also don't work. Regards, Carsten BTW: There is another bug in this bugtracker. You can't use UTF-8 chars in bug reports, after submitting a comment, UTF-8 chars will be replaced with entities, but all comments are placed between <pre> tags. Thus the browser shows entities and not the correct chars. Please open this html page with a browser: | <html> | <head> | <meta http-equiv=content-type content="text/html; charset=UTF-8"> | </head> | <body> | αβγδεζηθ | </body> | </html> and replace all entities in by comment with the chars you can see in the browser. ------------------------------------------------------------------------ [2007-11-01 22:11:12] banu_daniel1 at yahoo dot com no i didn't see that. i remove that " and the result is exactly the same( Array ( ) ). I've try with other folders (non utf) and it works. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=43148 -- Edit this bug report at https://bugs.php.net/bug.php?id=43148&edit=1