Hi NRK, On Thu, Jul 04, 2024 at 03:04:42AM +0000, NRK wrote: > Hello all, > > A couple days ago I was looking into how dmenu deals with invalid utf8 > sequences and noticed a couple odd things. Here's the testcase for those > who want to follow along: > > $ printf "0\xef1234567\ntest" | dmenu > > In drw.c::utf8decode(), invalid utf8 sequence is set to U+FFFD (�) and > drw_text continues on doing it's width calculation as if there was a > U+FFFD codepoint in the text. > > However when it comes to actually rendering the text via > XftDrawStringUtf8(), we simply pass it `utf8str`; which obviously > doesn't have any U+FFFD but instead has invalid utf8 sequences. > > I'm not sure if this is documented or not, but on my system xft > basically just cuts the text off at the error. In other words, only 0 is > rendered, followed by a large blank area (see pic0.png). > > Is this actually the expected behavior? If yes, then why not break out > early on error instead of calculating width with a made up U+FFFD which > will never be rendered? > > I have a rough patch which actually renders invalid utf8 as � instead of > cutting it off (see pic1.png). IMO it's a nicer behavior. But I wanted > to ask what everyone else expects before polishing the patch and sending > it over. > > I also noticed that in utf8decode() there's this line: > > if (j < len) > return 0; > > Is this ever reachable? If yes, wouldn't it be a infinite loop since > `text` would never advance inside drw_text()? > > - NRK
It should indeed be the replacement character. I'd be interested to review the patch. Thank you, -- Kind regards, Hiltjo