ID: 36775 User updated by: ez at daoldskool dot org Reported By: ez at daoldskool dot org -Status: Feedback +Status: Open Bug Type: WDDX related Operating System: OSX Tiger 10.4.5 PHP Version: 5.1.2 New Comment:
Well, tony, the problem is pretty self evident : if you don't want the wddx_deserializer to mess with an utf8 encoded docuemnt, you have to pass it utf8 encoded doesn't this sound weird to you ? wddx_deserializer can only work on document utf8 encoded twice it's crazy ! the bug has been already reported several times and is still open : http://bugs.php.net/bug.php?id=35241 and look at the contributions in the documentation : http://de2.php.net/manual/en/function.wddx-deserialize.php it seems like this bug was intriduced with release 5 and YES wddx functions ARE using EXPAT : from the 5.1.2 release sources : ext/wddx.c, line 25 : #include "ext/xml/expat_compat.h" ext/wddx.c, line 1140 : parser = XML_ParserCreate("ISO-8859-1"); --- BTW, why forcing the encoding here ? EXPAT should recognize the encoding, according to the encoding declaration in the document itself : http://www.xml.com/pub/a/1999/09/expat/reference.html all i am asking is to be able to work transparently on unicode documents without the pain of encoding them twice did you look at this code : http://peoplemode.daoldskool.org:88/__dev/test/ test_NATIVE.php http://peoplemode.daoldskool.org:88/__dev/test/ test_NATIVE.php.s doesn't it look strange to you that i have to utf8_encode the XML stream before passing it to wddx_deserialize : the XML stream is already unicode this is for real, check it ! Previous Comments: ------------------------------------------------------------------------ [2006-03-18 18:15:39] [EMAIL PROTECTED] >it seems like wddx functions are still using the EXPAT xml parser Only if you compiled them this way. Sorry, I still don't get what is the problem and what are you proposing. ------------------------------------------------------------------------ [2006-03-18 13:19:10] ez at daoldskool dot org Got the cli binary compiled from sources (stable release 5.1.2 & cvs trunk) on OS X, and could reproduce the bug it seems like wddx functions are still using the EXPAT xml parser according to EXPAT api documentation, the method XML_ParserCreate can recognize the document encoding based on the document declaration headers otherwise, XML_ParserCreate can work on those 4 different encodings US-ASCII, UTF-8, UTF-16, ISO-8859-1 so i am working to find a bulletproof way to check the document encoding declaration within xml headers if the xml stream has not any ancoding declaration then only it's legitimate for decoding strings while parsing the tree MHO am i missing something ? anyone agree ? anyone ------------------------------------------------------------------------ [2006-03-17 19:49:24] ez at daoldskool dot org alright, let's roll ! ------------------------------------------------------------------------ [2006-03-17 19:33:30] [EMAIL PROTECTED] You don't need any accounts to post the patch. ------------------------------------------------------------------------ [2006-03-17 19:29:57] ez at daoldskool dot org Description: ------------ Hi folks ! cannot figure out why the issue is still open ? wddx serialization/deserialization MUST be reversible, symetric and scalable if it's necessary to utf8_encode a string that's already encoded, what's the point thus you are breaking something here anyone volunteer here ? if not give me a developper account and i'll fix it ;) for real ! here is another proof of concept : http://peoplemode.daoldskool.org:88/__dev/test/test_NATIVE.php comparing to PEAR : http://peoplemode.daoldskool.org:88/__dev/test/test_PEAR.php Thanx anyway, comments very appreciated Regards Antonin ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=36775&edit=1