On 05.04.2012 21:18, Bruno Haible wrote: > Hi Vladimir, > mbsnwidth returns -1 in such a case only if the option MBSW_REJECT_INVALID > is passed as third argument. If you pass 0, mbsnwidth will not return -1; > instead, it will assume width 1 for every invalid byte or unprintable > character. Ok, will use mbsnwidth instead then. >>> - The function __argp_get_display_len looks very similar to mbsnwidth(), >> Remaining is the issue due to escape sequences. > What is the use case? PO file editors are not required to support editing > of strings with control characters. msgfmt warns when a message in a PO file > contains an unusual control character like ESC. Unfortunately it doesn't do so enough (see my post on bug-gettext, still got no answer to it). In particular it accepts the file cpio/ko.po from TP with no warnings despite it containing loads of \e. Some other control characters like \b are ignored as well. (the file in reality uses unsupported ISO-2022 variant, an encoding using many escapes and not EUC-KR as it claims)
>> it is used in en@boldquot > Ah, right. But I don't know how frequently it is used; maybe I and Simon > were the only persons to ever use this? If we want to support this, not > only mbswidth has to be modified, but basically any code that uses > wcwidth - including libunistring. So, until this is discussed (and possibly > generalized to more languages than 'en'), I propose to get away without > it. Ok. In long term I see only 2 possible ways: deprecate en@boldquot or fix all those places. I don't care if boldquot gets deprecated. >> Done but the test is valid only for UTF-8 locales. Should I force some >> specific locale? It's impossible to make a test working in all locales >> since in case of e.g. ASCII we don't have such characters at all. > In such a situation, it is best to split the test into two parts: a part > that can be executed on every machine, and a part which can only be executed > on a system with a UTF-8 locale. This way, the first part is not skipped > just because the system has no UTF-8 locale. Ok, will do. Can I include all the "normal" test in UTF-8 test for simplicity? > Please take a look how it's done in module 'mbsstr-tests': > - test-mbsstr1.c is a test that doesn't need a particular locale. > - test-mbsstr2.c is a test that requires a UTF-8 locale. We use the > French one for simplicity. (If a system does not have fr_FR.UTF-8 > installed, it would be unlikely that it has ru_RU.UTF-8 installed.) > - test-mbsstr2.sh is a wrapper script that uses the LOCALE_FR_UTF8 > value, determined by m4/locale-fr.m4, and invokes test-mbsstr2. Ok. > + if (wc == '\e' && ptr + 3 < end > + && ptr[1] == '[' && (ptr[2] == '0' || ptr[2] == '1') > + && ptr[3] == 'm') > '\e' is not portable, only GCC supports it. Use '\x1b' or '\033' instead. > > Also, the test ptr + 3 < end is wrong. Should be written as > end - ptr > 3 > instead. (Think of ptr = 0xFFFFFFD, end = 0xFFFFFFFE on a 32-bit machine.) > Sure, on many systems this won't matter, because this memory range is > either unmapped or occupied by the stack. But in general you have no guarantee > that the memory page from 0xFFFFC000..0xFFFFFFFF will not be used for > malloc(). I have already been bitten by this once on sparc64 with GRUB :( > Bruno > > -- Regards Vladimir 'φ-coder/phcoder' Serbinenko
signature.asc
Description: OpenPGP digital signature