Дана 26/03/15 01:22PM, amrit44404 написа:
> Raw C1 bytes are not valid UTF-8. utf8decode() returns U+FFFD for
> them which gets drawn on screen as a replacement character.
>
> Fix this by skipping C1 bytes in twrite() before utf8decode() sees
> them. The ESC_STR guard lets them through when inside a STR sequence
> so they can still act as sequence terminators.
What is there to fix? IMHO, invalid UTF should show as a replacement
character. At best it is a personal preference/opinion to silently
ignore it. There are counterexamples which support the current
behavior, however. For example, Linux console (VT, VC) also displays
U+FFFD for invalid UTF characters when in UTF mode, even if they are
part of escape sequences. Try
printf '\e%%G\x9b1mBold\x9b0m\n'
in Linux VC (Ctrl+Alt+Fn). In Artix Linux, this will output
■1mBold■0m
including the characters U+FFFD which represent invalid UTF, because
the console is not in UTF mode, but
printf '\e%%@\x9b1mBold\x9b0m\n'
prints the text "Bold" in bold.