Edit report at http://bugs.php.net/bug.php?id=54028&edit=1

 ID:                 54028
 Comment by:         carsten_sttgt at gmx dot de
 Reported by:        schmale at froglogic dot com
 Summary:            Directory::read() cannot handle non-unicode chars
                     properly
 Status:             Bogus
 Type:               Bug
 Package:            Directory function related
 Operating System:   Windows 7
 PHP Version:        5.3.5
 Block user comment: N
 Private report:     N

 New Comment:

| and the problem does only occur with Windows/CLI.



I have no difference between CGI and CLI (both executed from the shell)



Of course, something is courious:

<?php

$directory = dir(getenv('USERPROFILE'));

while (false !== ($content = $directory->read())) {

    if (mb_check_encoding($content, 'UTF-8') === false) {

        printf('Returned non-utf-8 (%s)', $content);

        printf(" Encoding: %s\r\n", mb_detect_encoding($content));

    }

}

?>



And the output is:

Returned non-utf-8 (Startmenü) Encoding: UTF-8





Regards,

Carsten


Previous Comments:
------------------------------------------------------------------------
[2011-02-15 17:10:43] schmale at froglogic dot com

Well, I don't know what Windows uses as encoding, but I sure do know,
that it works properly with the Windows CGI version. The point is, a
directory called 'Startmenü' will return 'Startmenü' with Linux/CGI,
Linux/CLI, Windows/CGI, but NOT with Windows/CLI - the latter returning
'Startmenñæ' (or sth similar). In other words: The behaviour with
Windows/CLI is broken, where the other versions return the exact name of
the directory, as expected.



So I think it has nothing (little) to do with unicode filesystem support
or the encoding of Windows, but with differences between CGI and CLI.

------------------------------------------------------------------------
[2011-02-15 16:54:17] paj...@php.net

There is already a feature request for unicode filesystem support.



Btw, Windows does not use UTF-8 for its encoding.

------------------------------------------------------------------------
[2011-02-15 16:51:20] schmale at froglogic dot com

Description:
------------
Notice: This problem does ONLY affect the CLI interpreter, NOT the CGI.



Using dir('path/to/dir'), the read() method does not return UTF-8, if
the directory contains e.g. umlauts (ä, ö, ü). I tested this on Linux
and Windows, both CGI and CLI, and the problem does only occur with
Windows/CLI.

Test script:
---------------
$path = 'path/to/directory/which/contains/umlauts';



$directory = dir($path);

while (false !== ($content = $directory->read())) {

    if (mb_check_encoding($content, 'UTF-8') === false) {

        fprintf(STDERR, 'Returned non-utf-8 (%s)', $content);

    }

}



Expected result:
----------------
The expected result, of course, was that the return value of read is
always encoded in UTF-8, i.e. no messages are print, when we run the
script.

Actual result:
--------------
If a subdirectory contains umlauts (or I guess any non-unicode
character), a message is print, i.e. the return value is not encoded in
UTF-8.


------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=54028&edit=1

Reply via email to