Re: [dev] [dmenu] What's the expected behavior on invalid utf8?

Hiltjo Posthuma Thu, 04 Jul 2024 09:08:06 -0700

Hi NRK,

On Thu, Jul 04, 2024 at 03:04:42AM +0000, NRK wrote:
> Hello all,
> 
> A couple days ago I was looking into how dmenu deals with invalid utf8
> sequences and noticed a couple odd things. Here's the testcase for those
> who want to follow along:
> 
>       $ printf "0\xef1234567\ntest" | dmenu
> 
> In drw.c::utf8decode(), invalid utf8 sequence is set to U+FFFD (�) and
> drw_text continues on doing it's width calculation as if there was a
> U+FFFD codepoint in the text.
> 
> However when it comes to actually rendering the text via
> XftDrawStringUtf8(), we simply pass it `utf8str`; which obviously
> doesn't have any U+FFFD but instead has invalid utf8 sequences.
> 
> I'm not sure if this is documented or not, but on my system xft
> basically just cuts the text off at the error. In other words, only 0 is
> rendered, followed by a large blank area (see pic0.png).
> 
> Is this actually the expected behavior? If yes, then why not break out
> early on error instead of calculating width with a made up U+FFFD which
> will never be rendered?
> 
> I have a rough patch which actually renders invalid utf8 as � instead of
> cutting it off (see pic1.png). IMO it's a nicer behavior. But I wanted
> to ask what everyone else expects before polishing the patch and sending
> it over.
> 
> I also noticed that in utf8decode() there's this line:
> 
>       if (j < len)
>               return 0;
> 
> Is this ever reachable? If yes, wouldn't it be a infinite loop since
> `text` would never advance inside drw_text()?
> 
> - NRK


It should indeed be the replacement character. I'd be interested to review the
patch.

Thank you,

-- 
Kind regards,
Hiltjo

Re: [dev] [dmenu] What's the expected behavior on invalid utf8?

Reply via email to