Edit report at https://bugs.php.net/bug.php?id=43148&edit=1

 ID:                 43148
 Comment by:         sebastian dot mayer at maysoft dot de
 Reported by:        banu_daniel1 at yahoo dot com
 Summary:            filesize and unicode filenames
 Status:             Not a bug
 Type:               Bug
 Package:            Filesystem function related
 Operating System:   windows xp 32 bits
 PHP Version:        5.2.4
 Block user comment: N
 Private report:     N

 New Comment:

Hallo,

running php 5.3.5 on windows I have a Problem with the degree char "°".
Scandir or opendir, readdir retrieves an entry called "Up-wards at 45°.mp3". 
If I want to get filesize or filetime message "filesize(): stat failed" comes 
up. All other characters (for example german Umlaute ä, ü ...) don't have 
this Problem.


Previous Comments:
------------------------------------------------------------------------
[2010-11-10 18:11:37] paj...@php.net

Yes, and I'm working on this change, it will accept UTF-8 as input just like 
what we do on Unices/POSIX systems.

------------------------------------------------------------------------
[2010-11-10 17:33:14] anton85s at mail dot ru

"it just passes the filename to the OSes filesystem func and if it fails - we 
can do nothing about it."
but it doesn't pass the filename to the unicode version of the filesystem 
function, right ? It means that php could be modifed to use the correct 
filesystem function at least, not non-unicode ones for all calls.

------------------------------------------------------------------------
[2007-11-12 10:03:04] tony2...@php.net

PHP doesn't care if it's Unicode or not, it just passes the filename to the 
OSes filesystem func and if it fails - we can do nothing about it.

------------------------------------------------------------------------
[2007-11-02 17:48:17] carsten_sttgt at gmx dot de

> but the problem is still there even on windows xp
> so this is the problem filesize function dose not
> work with filenames with unicode characters.

Ok, after some more tests, I can reproduce this problem. Just look at this 
shell log:
| D:\>cd 
D:\Apache2.2\htdocs\test\αβγδεζηθ
|
| D:\Apache2.2\htdocs\test\αβγδεζηθ>dir 
/b
| index.html
| phpinfo.php
|
| 
D:\Apache2.2\htdocs\test\αβγδεζηθ>type 
index.html
| <html><body><h1>It works!</h1></body></html>
| 
D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>type 
phpinfo.php
| <?php phpinfo(); ?>
|
| 
D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>pear-request
 http://localhost/
| test/%ce%b1%ce%b2%ce%b3%ce%b4%ce%b5%ce%b6%ce%b7%ce%b8/index.html
| <html><body><h1>It works!</h1></body></html>
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>php 
-r "echo getcwd();"
| D:\Apache2.2\htdocs\test\aß?de???
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>cd..
|
| D:\Apache2.2\htdocs\test>php -r 
"var_dump(stat('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;'));"
|
| Warning: stat(): stat failed for aß?de??? in Command line code on
|  line 1
| bool(false)
|
| D:\Apache2.2\htdocs\test>

As you can see, I can't execute a PHP script in this folder 
("&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;") or use the PHP filesystem 
functions with this path. But I can access this folder correctly with Apache 
via HTTP.


> on linux version i don't have this problem.

That's the difference. On Linux (or PHP) you have only UTF-8. But Windows is 
using UTF-16 (or the current codepage for the installed locale).


Just look at this script "test.php" (encoded in UTF-8):
| <?php
| mkdir('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;');
| var_dump(is_dir('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;'));
| ?>

and the shell log:
| D:\Apache2.2\htdocs\test>php test.php
| bool(true)
|
| D:\Apache2.2\htdocs\test>dir /b
| test.php
| αβγδεζηθ
| 
| D:\Apache2.2\htdocs\test>

As you can see, you can create and access such paths with such a name with PHP, 
but only inside PHP. In Windows or Apache you must use an other (wrong) name. 
In this case PHP is just using the byte sequence of UTF-8 chars as Latin1 chars.

This can be a quick fix for you, but is indeed not correct.

The problem is, PHP is only using simple string and filesystem functions in the 
c sources, which are only working with the current locale codepage. But it is 
not using the wide char and filesystem functions from the Windows SDK, like 
Apache did.

BTW:
With a current PHP6 snap (full unicode support?), this also don't work.

Regards,
Carsten

BTW:
There is another bug in this bugtracker. You can't use UTF-8 chars in bug 
reports, after submitting a comment, UTF-8 chars will be replaced with 
entities, but all comments are placed between <pre> tags. Thus the browser 
shows entities and not the correct chars.

Please open this html page with a browser:
| <html>
| <head>
| <meta http-equiv=content-type content="text/html; charset=UTF-8">
| </head>
| <body>
| &#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;
| </body>
| </html>
and replace all entities in by comment with the chars you can see in the 
browser.

------------------------------------------------------------------------
[2007-11-01 22:11:12] banu_daniel1 at yahoo dot com

no i didn't see that. i remove that " and the result is exactly the same( Array 
( ) ).
I've try with other folders (non utf) and it works.

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    https://bugs.php.net/bug.php?id=43148


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=43148&edit=1

Reply via email to