Дана 26/03/15 01:22PM, amrit44404 написа:
> Raw C1 bytes are not valid UTF-8. utf8decode() returns U+FFFD for
> them which gets drawn on screen as a replacement character.
> 
> Fix this by skipping C1 bytes in twrite() before utf8decode() sees
> them. The ESC_STR guard lets them through when inside a STR sequence
> so they can still act as sequence terminators.

What is there to fix? IMHO, invalid UTF should show as a replacement
character. At best it is a personal preference/opinion to silently
ignore it. There are counterexamples which support the current
behavior, however. For example, Linux console (VT, VC) also displays
U+FFFD for invalid UTF characters when in UTF mode, even if they are
part of escape sequences. Try

        printf '\e%%G\x9b1mBold\x9b0m\n'

in Linux VC (Ctrl+Alt+Fn). In Artix Linux, this will output

        ■1mBold■0m

including the characters U+FFFD which represent invalid UTF, because
the console is not in UTF mode, but

        printf '\e%%@\x9b1mBold\x9b0m\n'

prints the text "Bold" in bold.

Reply via email to