Vincent Lefevre wrote:
I wonder whether anyone is interested in matching individual bytes
in a file regarded as UTF-8 encoded. This seems weird.
It's not weird at all. For example, suppose we invent the notation
[[:error:]] to match encoding errors. Then the pattern '[[:error:]]'
would match all encoding errors in a file, which could well be a useful
thing.
Currently, for example, the tz package <http://www.iana.org/time-zones>
has a Make rule 'check_character_set' that verifies that the source
files are all properly encoded. It executes this shell command:
! grep -nv '^.*$' file names
This relies on GNU grep's behavior that "." does not match an encoding
error. But it's a command that is not obvious. It'd be simpler and
clearer to write this:
! grep -n '[[:error:]]' file names
if such a feature were available.
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org