Eric Blake <[email protected]> writes:
> On 08/17/2018 10:05 AM, Markus Armbruster wrote:
>> The JSON parser treats each half of a surrogate pair as unpaired
>> surrogate. Fix it to recognize surrogate pairs.
>>
>> Signed-off-by: Markus Armbruster <[email protected]>
>> Reviewed-by: Eric Blake <[email protected]>
>
> I might have dropped the R-b, to ensure the changes since v1 get
> re-reviewed.
I intended to, but screwed up. My apologies.
>> ---
>> qobject/json-parser.c | 60 ++++++++++++++++++++++++++++---------------
>> tests/check-qjson.c | 3 +--
>> 2 files changed, 40 insertions(+), 23 deletions(-)
>>
>
>> @@ -157,22 +169,28 @@ static QString *parse_string(JSONParserContext *ctxt,
>> JSONToken *token)
>> qstring_append_chr(str, '\t');
>> break;
>> case 'u':
>> - cp = 0;
>> - for (i = 0; i < 4; i++) {
>> - if (!qemu_isxdigit(*ptr)) {
>> - parse_error(ctxt, token,
>> - "invalid hex escape sequence in
>> string");
>> - goto out;
>> + cp = cvt4hex(ptr);
>> + ptr += 4;
>> +
>> + /* handle surrogate pairs */
>> + if (cp >= 0xD800 && cp <= 0xDBFF
>> + && ptr[0] == '\\' && ptr[1] == 'u') {
>> + /* leading surrogate followed by \u */
>> + cp = 0x10000 + ((cp & 0x3FF) << 10);
>> + trailing = cvt4hex(ptr + 2);
>> + if (trailing >= 0xDC00 && trailing <= 0xDFFF) {
>> + /* followed by trailing surrogate */
>> + cp |= trailing & 0x3FF;
>> + ptr += 6;
>> + } else {
>> + cp = -1; /* invalid */
>> }
>> - cp <<= 4;
>> - cp |= hex2decimal(*ptr);
>> - ptr++;
>> }
>> if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf),
>> cp) < 0) {
>> parse_error(ctxt, token,
>> - "\\u%.4s is not a valid Unicode character",
>> - ptr - 3);
>> + "%.*s is not a valid Unicode character",
>> + (int)(ptr - beg), beg);
>
> The error reporting here has indeed been improved over v1.
>
> Reviewed-by: Eric Blake <[email protected]>
Thanks!