ID:               49475
 User updated by:  elmue at gmx dot de
 Reported By:      elmue at gmx dot de
 Status:           Bogus
 Bug Type:         *Unicode Issues
 Operating System: Windows
 PHP Version:      6SVN-2009-09-05 (snap)
 New Comment:

Hello

> In the meantime you can use mbstring to convert to your local
encoding

I suppose that you did not read what I explained.

How do I use mbstring to convert anything if an empty string is
returned?
mbstring can only convert an empty string into another empty sting!
This would not be very usefull!
And mbstring also can't convert "??????.txt" into anything usefull.

The code that I posted works fine on PHP 5 (at least if I don't use
greek or russian characters) but on PHP 6 it is broken.

There is no way!
On PHP 6 you can't currently work with filenames that have an accent or
umlaut. Its worse than PHP 5.

Elmü


Previous Comments:
------------------------------------------------------------------------

[2009-09-05 22:05:25] paj...@php.net

There is already a feature request about that. In the meantime you can
use mbstring to convert to your local encoding (check your prefs and
verify which encoding you have to use). But real unicode support for
file operations will not be available soon, early next year at the
soonest.


------------------------------------------------------------------------

[2009-09-05 21:57:04] elmue at gmx dot de

Description:
------------
Hello

I have PHP6 - VC6 compiled on 3. Sept 2009.

How to reproduce the bug:

Create a file:
C:\Temp\Tést.txt
(note the accent on the e)

Execute the code below.

What happens is the warning:
"Could not convert binary string to Unicode string (converter UTF-8
failed on bytes (0xE9) at offset 1)"

(E9 is the Ascii code of the 'é' character)

and an empty string is returned in $File.

If the filename contains russian or greek characters it is even worse:
In this case no warning is displayed and the filename is returned as
"??????.txt"

This warning message is nonsense.
All Windows Operating Systems store Filenames in Unicode except Windows
95,98,ME which are out of date.

So there is no reason to put the filename into an UTF-8 converter as
the warning says.
There is no conversion required on Windows if the correct API is used.
Windows offers the old FindFirstFileA(...) API and the Unicode
FindFirstFileW(..) API. I hope that the PHP programmers did not make the
error to use the Ansii versions which are Codepage dependent and produce
a !lot! of problems.

The Wide API like FindFirstFileW(...) returns ALL filenames directly in
Unicode. There is NO CONVERSION required on Windows and there is NO
UTF-8 converter required.

I also played around with different settings for
ini_set("unicode.filesystem_encoding", "...")

but the error stays the same.
There is design error deep in the code.

Elmü


Reproduce code:
---------------
<?php
$hDir = opendir("C:\\Temp");
while ($hDir) 
{
    $File = readdir($hDir);       // <--- produces warning
    if ($File === false) break;
    echo "File=$File<br>";
}
?>

Expected result:
----------------
correct filename
no warning

Actual result:
--------------
the file is returned as empty string or as "?????.txt"


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=49475&edit=1

Reply via email to