Edit report at https://bugs.php.net/bug.php?id=55646&edit=1
ID: 55646 Updated by: cataphr...@php.net Reported by: zedwoodnoreply at gmail dot com Summary: decoding csr corrupts UTF8 characters -Status: Open +Status: Closed Type: Bug Package: OpenSSL related Operating System: Ubuntu 10.04 PHP Version: 5.3.8 -Assigned To: +Assigned To: cataphract Block user comment: N Private report: N New Comment: This bug has been fixed in SVN. Snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. For Windows: http://windows.php.net/snapshots/ Thank you for the report, and for helping us make PHP better. Fixed, but for 5.4 only due to backward compatibility concerns. Thank you. Previous Comments: ------------------------------------------------------------------------ [2011-09-12 17:22:55] cataphr...@php.net Automatic comment from SVN on behalf of cataphract Revision: http://svn.php.net/viewvc/?view=revision&revision=316562 Log: - Fixed bug #55646: textual data is returned in UTF-8, but is input in another encoding. 5.4 only as this implies a BC break. ------------------------------------------------------------------------ [2011-09-08 18:01:20] zedwoodnoreply at gmail dot com I saved the csr that php generated... and the parsed it with: openssl asn1parse -in a.csr 0:d=0 hl=4 l= 687 cons: SEQUENCE 4:d=1 hl=4 l= 407 cons: SEQUENCE 8:d=2 hl=2 l= 1 prim: INTEGER :00 11:d=2 hl=2 l= 106 cons: SEQUENCE 13:d=3 hl=2 l= 11 cons: SET 15:d=4 hl=2 l= 9 cons: SEQUENCE 17:d=5 hl=2 l= 3 prim: OBJECT :countryName 22:d=5 hl=2 l= 2 prim: PRINTABLESTRING :US 26:d=3 hl=2 l= 13 cons: SET 28:d=4 hl=2 l= 11 cons: SEQUENCE 30:d=5 hl=2 l= 3 prim: OBJECT :stateOrProvinceName 35:d=5 hl=2 l= 4 prim: PRINTABLESTRING :Utah 41:d=3 hl=2 l= 15 cons: SET 43:d=4 hl=2 l= 13 cons: SEQUENCE 45:d=5 hl=2 l= 3 prim: OBJECT :localityName 50:d=5 hl=2 l= 6 prim: PRINTABLESTRING :Lindon 58:d=3 hl=2 l= 16 cons: SET 60:d=4 hl=2 l= 14 cons: SEQUENCE 62:d=5 hl=2 l= 3 prim: OBJECT :organizationName 67:d=5 hl=2 l= 7 prim: PRINTABLESTRING :Chinese 76:d=3 hl=2 l= 15 cons: SET 78:d=4 hl=2 l= 13 cons: SEQUENCE 80:d=5 hl=2 l= 3 prim: OBJECT :organizationalUnitName 85:d=5 hl=2 l= 6 prim: T61STRING :IT äº 93:d=3 hl=2 l= 24 cons: SET 95:d=4 hl=2 l= 22 cons: SEQUENCE 97:d=5 hl=2 l= 3 prim: OBJECT :commonName 102:d=5 hl=2 l= 15 prim: PRINTABLESTRING :www.example.com 119:d=2 hl=4 l= 290 cons: SEQUENCE 123:d=3 hl=2 l= 13 cons: SEQUENCE 125:d=4 hl=2 l= 9 prim: OBJECT :rsaEncryption 136:d=4 hl=2 l= 0 prim: NULL 138:d=3 hl=4 l= 271 prim: BIT STRING 413:d=2 hl=2 l= 0 cons: cont [ 0 ] 415:d=1 hl=2 l= 13 cons: SEQUENCE 417:d=2 hl=2 l= 9 prim: OBJECT :md5WithRSAEncryption 428:d=2 hl=2 l= 0 prim: NULL 430:d=1 hl=4 l= 257 prim: BIT STRING ------------------------------------------------------------------------ [2011-09-08 17:57:51] zedwoodnoreply at gmail dot com Description: ------------ I did this in command line php, OpenSSL 0.9.8k 25 Mar 2009 If I create a csr with a UTF8 character, then I ought to get the UTF8 character out, untampered with when I parse it. Test script: --------------- <?php function stringAsHex($string){$unpacked = unpack("H*", $string);return implode(" ", str_split($unpacked[1],2));} $config = array("digest_alg" => "sha1","x509_extensions" => "v3_ca","req_extensions" => "v3_req","private_key_bits" => 2048,"private_key_type" => OPENSSL_KEYTYPE_RSA,"encrypt_key" => false,); $csr_info = array( "countryName" => "US", "stateOrProvinceName" => "Utah", "localityName" => "Lindon", "organizationName" => "Chinese", "organizationalUnitName" => "IT \xe4\xba\x92", "commonName" => "www.example.com",); $private = openssl_pkey_new($config); $csr_res = openssl_csr_new($csr_info, $private); openssl_csr_export($csr_res, $csr); //echo $csr; $output = openssl_csr_get_subject($csr); echo "A: ".$csr_info["organizationalUnitName"]."\n"; echo "B: ".stringAsHex($csr_info["organizationalUnitName"])."\n"; echo "C: ".$output['OU']."\n"; echo "D: ".stringAsHex($output['OU'])."\n"; Expected result: ---------------- A: IT äº B: 49 54 20 e4 ba 92 C: IT äº D: 49 54 20 e4 ba 92 Actual result: -------------- A: IT äº B: 49 54 20 e4 ba 92 C: IT 亠D: 49 54 20 c3 a4 c2 ba c2 92 ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=55646&edit=1