When using terminal software on non-OpenBSD to connect to my OpenBSD IRC machine, I noticed that sometimes the local terminal disagrees with the remote tmux and application (in this case, irssi) about the character width of some lines, causing different kinds of breakage. Those lines happened to contain soft hyphens (U+00AD), which behave as follows across a few different operating systems:
OpenBSD-CURRENT: iswprint(SHY) = 1 iswcntrl(SHY) = 1 wcwidth(SHY) = 0 NetBSD 9.1: iswprint(SHY) = 1 iswcntrl(SHY) = 0 wcwidth(SHY) = 1 FreeBSD 12.2: iswprint(SHY) = 0 iswcntrl(SHY) = 1 wcwidth(SHY) = -1 glibc (Debian sid): iswprint(SHY) = 1 iswcntrl(SHY) = 0 wcwidth(SHY) = 1 musl (Alpine 3.13.3): iswprint(SHY) = 1 iswcntrl(SHY) = 0 wcwidth(SHY) = 1 On Windows, PowerShell, PuTTY and MinTTY (shipped with the default install of git from git-scm.com as part of MSYS2) render the soft hyphen as a visible character with a width of 1 column. The OpenBSD wcwidth(SHY) of 0 is what the problem comes down to (FreeBSD's return values are also strange, but this is an OpenBSD list). There is a lot of background discussion about whether or not Unicode intends the SHY to be printable or not, and whether it should have width of 0 or 1, in eg. [0] and [1], but for better or worse, it seems most other systems decided that SHY has a width of 1 and should be a visible character (at least in terminal contexts). Therefore, in the interest of interoperability, I propose the following diff to special-case SHY into having a width of 1. I don't intend to go down the rabbit hole of a discussion regarding what the 'correct' width is, but the discrepancy with other systems causes real problems, and I think those other systems made their decisions years ago (see eg. [0] for glibc). Diff below only for gen_ctype_utf8.pl; I am not including the resulting en_US.UTF-8.src diff, because it seems there is a Unicode 12.1.0 to 13.0.0 update that happens on regeneration of that file, and that is orthogonal to this change (essentially: [2], which has not been committed yet) [0]: https://sourceware.org/bugzilla/show_bug.cgi?id=22073 [1]: https://jkorpela.fi/shy.html [2]: https://marc.info/?l=openbsd-tech&m=161534047428793&w=2 diff --git a/share/locale/ctype/gen_ctype_utf8.pl b/share/locale/ctype/gen_ctype_utf8.pl index e23472efb2c..c593dc628ee 100755 --- a/share/locale/ctype/gen_ctype_utf8.pl +++ b/share/locale/ctype/gen_ctype_utf8.pl @@ -404,6 +404,9 @@ sub codepoint_columns # Several fonts provide glyphs in this range return 1 if $code >= 0xe000 and $code <= 0xf8ff; + # Soft hyphen (SHY) is in category Cf, which implies width 0, but since + # it is width 1 in nearly every other environment, set it here. + return 1 if $code == 0x00ad; return 0 if $charinfo->{category} eq 'Mn'; return 0 if $charinfo->{category} eq 'Me'; -- Lauri Tirkkonen | lotheac @ IRCnet