On 08/17/2018 10:05 AM, Markus Armbruster wrote:
The JSON parser treats each half of a surrogate pair as unpaired
surrogate. Fix it to recognize surrogate pairs.
Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
I might have dropped the R-b, to ensure the changes since v1 get
re-reviewed.
---
qobject/json-parser.c | 60 ++++++++++++++++++++++++++++---------------
tests/check-qjson.c | 3 +--
2 files changed, 40 insertions(+), 23 deletions(-)
@@ -157,22 +169,28 @@ static QString *parse_string(JSONParserContext *ctxt,
JSONToken *token)
qstring_append_chr(str, '\t');
break;
case 'u':
- cp = 0;
- for (i = 0; i < 4; i++) {
- if (!qemu_isxdigit(*ptr)) {
- parse_error(ctxt, token,
- "invalid hex escape sequence in string");
- goto out;
+ cp = cvt4hex(ptr);
+ ptr += 4;
+
+ /* handle surrogate pairs */
+ if (cp >= 0xD800 && cp <= 0xDBFF
+ && ptr[0] == '\\' && ptr[1] == 'u') {
+ /* leading surrogate followed by \u */
+ cp = 0x10000 + ((cp & 0x3FF) << 10);
+ trailing = cvt4hex(ptr + 2);
+ if (trailing >= 0xDC00 && trailing <= 0xDFFF) {
+ /* followed by trailing surrogate */
+ cp |= trailing & 0x3FF;
+ ptr += 6;
+ } else {
+ cp = -1; /* invalid */
}
- cp <<= 4;
- cp |= hex2decimal(*ptr);
- ptr++;
}
if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {
parse_error(ctxt, token,
- "\\u%.4s is not a valid Unicode character",
- ptr - 3);
+ "%.*s is not a valid Unicode character",
+ (int)(ptr - beg), beg);
The error reporting here has indeed been improved over v1.
Reviewed-by: Eric Blake <[email protected]>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org