ID:               36775
 User updated by:  ez at daoldskool dot org
 Reported By:      ez at daoldskool dot org
-Status:           Feedback
+Status:           Open
 Bug Type:         WDDX related
 Operating System: OSX Tiger 10.4.5
 PHP Version:      5.1.2
 New Comment:

Well, tony, the problem is pretty self evident :

if you don't want the wddx_deserializer to mess with an utf8 
encoded docuemnt, you have to pass it utf8 encoded

doesn't this sound weird to you ? wddx_deserializer can only 
work on document utf8 encoded twice

it's crazy !

the bug has been already reported several times and is still 
open :

http://bugs.php.net/bug.php?id=35241

and look at the contributions in the documentation :

http://de2.php.net/manual/en/function.wddx-deserialize.php

it seems like this bug was intriduced with release 5

and YES wddx functions ARE using EXPAT :

from the 5.1.2 release sources :

ext/wddx.c, line 25 :
#include "ext/xml/expat_compat.h"

ext/wddx.c, line 1140 :
parser = XML_ParserCreate("ISO-8859-1");

---

BTW, why forcing the encoding here ? EXPAT should recognize 
the encoding, according to the encoding declaration in the 
document itself :
http://www.xml.com/pub/a/1999/09/expat/reference.html

all i am asking is to be able to work transparently on 
unicode documents without the pain of encoding them twice

did you look at this code : 
http://peoplemode.daoldskool.org:88/__dev/test/
test_NATIVE.php
http://peoplemode.daoldskool.org:88/__dev/test/
test_NATIVE.php.s

doesn't it look strange to you that i have to utf8_encode 
the XML stream before passing it to wddx_deserialize : the 
XML stream is already unicode

this is for real, check it !


Previous Comments:
------------------------------------------------------------------------

[2006-03-18 18:15:39] [EMAIL PROTECTED]

>it seems like wddx functions are still using the EXPAT xml parser
Only if you compiled them this way.

Sorry, I still don't get what is the problem and what are you
proposing.

------------------------------------------------------------------------

[2006-03-18 13:19:10] ez at daoldskool dot org

Got the cli binary compiled from sources (stable release 5.1.2 & cvs
trunk) on OS X, and could reproduce the bug

it seems like wddx functions are still using the EXPAT xml parser

according to EXPAT api documentation, the method XML_ParserCreate can
recognize the document encoding based on the document declaration
headers

otherwise, XML_ParserCreate can work on those 4 different encodings
US-ASCII, UTF-8, UTF-16, ISO-8859-1 

so i am working to find a bulletproof way to check the document
encoding declaration within xml headers

if the xml stream has not any ancoding declaration then only it's
legitimate for decoding strings while parsing the tree

MHO

am i missing something ? anyone agree ?

anyone

------------------------------------------------------------------------

[2006-03-17 19:49:24] ez at daoldskool dot org

alright, let's roll !

------------------------------------------------------------------------

[2006-03-17 19:33:30] [EMAIL PROTECTED]

You don't need any accounts to post the patch.

------------------------------------------------------------------------

[2006-03-17 19:29:57] ez at daoldskool dot org

Description:
------------
Hi folks !

cannot figure out why the issue is still open ?

wddx serialization/deserialization MUST be reversible, symetric and
scalable

if it's necessary to utf8_encode a string that's already encoded,
what's the point

thus you are breaking something here

anyone volunteer here ? if not give me a developper account and i'll
fix it ;) for real !

here is another proof of concept :

http://peoplemode.daoldskool.org:88/__dev/test/test_NATIVE.php

comparing to PEAR :

http://peoplemode.daoldskool.org:88/__dev/test/test_PEAR.php

Thanx anyway, comments very appreciated

Regards

Antonin



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=36775&edit=1

Reply via email to