#22108 [Asn]: php doesn't ignore the utf-8 BOM

moriyoshi Wed, 04 Jun 2003 18:02:58 -0700

 ID:               22108
 Updated by:       [EMAIL PROTECTED]
 Reported By:      bugzilla at jellycan dot com
 Status:           Assigned
 Bug Type:         Feature/Change Request
 Operating System: Any
 PHP Version:      All (as of the current implementation)
 Assigned To:      moriyoshi
 New Comment:


And just for clarification, this is a scanner problem, irrelevant to
the parser.



Previous Comments:
------------------------------------------------------------------------

[2003-06-04 02:45:43] [EMAIL PROTECTED]

It wasn't assigned, just set to open (and I didn't notice your name in
the "Assign to" field).

------------------------------------------------------------------------

[2003-06-04 02:40:30] [EMAIL PROTECTED]

Derick,

Please do not change the status of the bug that is already assigned to
someone.

There's no point that PHP can only handle ASCII documents because if
you want to use German in PHP for example, at least you have to use
ISO-8859-1 or ISO-8859-15, which is not even part of ASCII.


------------------------------------------------------------------------

[2003-06-03 14:17:22] [EMAIL PROTECTED]

Feel free to rewrite the parser, but that's just not going to happen.
We want ascii import, not unicode.

------------------------------------------------------------------------

[2003-06-03 14:07:16] gump at hotmail dot com

> [8 Feb 4:24am CST] [EMAIL PROTECTED]

> PHP doesn't want UNICODE scripts, but just ASCII ones. Not 
> a bug -> bogus.

Not bogus.  

PHP is embedded in HTML, the surrounding document determines the
encoding.  You can't just specify this problem out of existence.

------------------------------------------------------------------------

[2003-05-05 03:40:23] tokiee at sayclub dot com

for who are not familiar with UTF-8:

UTF-8(UCS Transformation Format 8) is not different to ASCII. it's
compatible with the ASCII: if you write your text in english with
UTF-8. you dont see any difference between the text in ASCII in each
byte. (and UTF-8 BOM is optional).

it's not quite a exact explanation of UTF-8 but: UTF-8 expands ASCII to
support Full UNICODE characters without disurbing any existing alphabet
order or something. so basically the UTF-8 is ASCII. and you dont have
to imagine it as totally new freak.

actually, when a modern Unicode-supported OS reads this UTF-8, the OS
needs to CONVERT it to real UNICODE internally. so the UTF-8 is rather
similar with URL encoding.

in ASCII world, each byte corresponds a character, up to 255
characters.

in UNICODE, two bytes corresponds a character, up to 65535 characters.
and it's totally a new system as you think.

in UTF-8, it's interesting, a character can be one byte, or two bytes,
or even 3, 4 bytes!. why is that so complicated but the rule is simple
and actually you dont have to handle this: OS will do it for you. 

even if you have any software which does not understand the utf-8, it's
totally okay because it's ASCII transparent. so it "can be used with
normal string comparison functions for sorting and such." (quoted in
PHP.NET Reference: utf8_encode())

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/22108

-- 
Edit this bug report at http://bugs.php.net/?id=22108&edit=1

#22108 [Asn]: php doesn't ignore the utf-8 BOM

Reply via email to