On 07/11/2017 08:46 PM, Thomas Huth wrote:
+        s1 = cpu_ldub_data_ra(env, addr + 1, ra);
+        s2 = cpu_ldub_data_ra(env, addr + 2, ra);
+        c = s0 & 0x0f;
+        c = (c << 6) | (s1 & 0x3f);
+        c = (c << 6) | (s2 & 0x3f);
+        /* Fold the byte-by-byte range descriptions in the PoO into
+           tests against the complete value.  It disallows encodings
+           that could be smaller, and the UTF-16 surrogates.  */
+        if (enh_check
+            && ((s1 & 0xc0) != 0x80
+                || (s2 & 0xc0) != 0x80
+                || c < 0x1000
+                || (c >= 0xd800 && c <= 0xdfff))) {
+            return 2;
+        }
+    } else if (s0 <= (enh_check ? 0xf4 : 0xf7)) {
+        /* four byte character */
+        l = 4;
+        if (ilen < 4) {
+            return 0;
+        }
+        s1 = cpu_ldub_data_ra(env, addr + 1, ra);
+        s2 = cpu_ldub_data_ra(env, addr + 2, ra);
+        s3 = cpu_ldub_data_ra(env, addr + 3, ra);
+        c = s0 & 0x0f;
I think you could also use 0x07 instead of 0x0f here. Shouldn't matter
much due to the s0 <= 0xf7 check above, though.


You're right that it doesn't matter due to the check, but fixed anyway. It makes the pattern wrt 2, 3, and 4 byte encodings more apparent.


r~

Reply via email to