This patch to the Go frontend and libgo rejects surrogate pairs when converting an int to a string. They are not valid UTF-8. The patch also rejects a negative int--negative ints were already rejected by the compiler, but not by the runtime. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.7 branch.
Ian
diff -r f16ad4ccc868 go/lex.cc --- a/go/lex.cc Fri Sep 21 23:32:36 2012 -0700 +++ b/go/lex.cc Fri Sep 21 23:42:31 2012 -0700 @@ -1312,6 +1312,12 @@ // Turn it into the "replacement character". v = 0xfffd; } + if (v >= 0xd800 && v < 0xe000) + { + warning_at(location, 0, + "unicode code point 0x%x is invalid surrogate pair", v); + v = 0xfffd; + } if (v <= 0xffff) { buf[0] = 0xe0 + (v >> 12); diff -r f16ad4ccc868 libgo/runtime/go-int-to-string.c --- a/libgo/runtime/go-int-to-string.c Fri Sep 21 23:32:36 2012 -0700 +++ b/libgo/runtime/go-int-to-string.c Fri Sep 21 23:42:31 2012 -0700 @@ -17,6 +17,11 @@ unsigned char *retdata; struct __go_string ret; + /* A negative value is not valid UTF-8; turn it into the replacement + character. */ + if (v < 0) + v = 0xfffd; + if (v <= 0x7f) { buf[0] = v; @@ -34,6 +39,10 @@ "replacement character". */ if (v > 0x10ffff) v = 0xfffd; + /* If the value is a surrogate pair, which is invalid in UTF-8, + turn it into the replacement character. */ + if (v >= 0xd800 && v < 0xe000) + v = 0xfffd; if (v <= 0xffff) {