Teemu Likonen <[EMAIL PROTECTED]> writes: > The problem is that the field width "%5s" is calculated wrong because > the letter "รค" takes two bytes in UTF-8 but is only one character.
According to SUSv3, the printf utility interprets the format string as specified under "File Format Notation", apart from various exceptions which do not apply here. And there, it says that the field width and precision control the number of bytes, rather than characters. This is also how the printf function works. I suppose bash without --posix could be changed to behave otherwise. However, I doubt the usefulness of merely counting Unicode characters. If you want to align columns of a table, you should also consider control characters, spacing and nonspacing combining characters, and fullwidth ideographic characters. If a simple character-counting feature were added now, perhaps with new syntax such as "%5Uc", extending it later to handle those cases too might cause compatibility problems.
pgpYV7spuwaWo.pgp
Description: PGP signature