Teemu Likonen <[EMAIL PROTECTED]> writes:

> The problem is that the field width "%5s" is calculated wrong because
> the letter "รค" takes two bytes in UTF-8 but is only one character.

According to SUSv3, the printf utility interprets the format
string as specified under "File Format Notation", apart from
various exceptions which do not apply here.  And there, it says
that the field width and precision control the number of bytes,
rather than characters.  This is also how the printf function
works.

I suppose bash without --posix could be changed to behave
otherwise.  However, I doubt the usefulness of merely counting
Unicode characters.  If you want to align columns of a table, you
should also consider control characters, spacing and nonspacing
combining characters, and fullwidth ideographic characters.
If a simple character-counting feature were added now, perhaps
with new syntax such as "%5Uc", extending it later to handle
those cases too might cause compatibility problems.

Attachment: pgpYV7spuwaWo.pgp
Description: PGP signature

Reply via email to