> Sed silently ignores (or what it does? - no info) invalid
> multibyte sequences in the input: no halt, no message,
> no false exit-code.

This is unfortunate but expected.  "." does not match a bad sequence,
see the fast path for UTF-8 in lib/regexec.c's check_node_accept_bytes:
returning 0 means that . does not match.

3781          unsigned char c = re_string_byte_at (input, str_idx), d;
3782          if (BE (c < 0xc2, 1))
3783            return 0;
3784
3785          if (str_idx + 2 > input->len)
3786            return 0;
3787
3788          d = re_string_byte_at (input, str_idx + 1);
3789          if (c < 0xe0)
3790            return (d < 0x80 || d > 0xbf) ? 0 : 2;
3791          else if (c < 0xf0)
3792            {
3793              char_len = 3;
3794              if (c == 0xe0 && d < 0xa0)
3795                return 0;
3796            }
3797          else if (c < 0xf8)
3798            {
3799              char_len = 4;
3800              if (c == 0xf0 && d < 0x90)
3801                return 0;
3802            }
3803          else if (c < 0xfc)
3804            {
3805              char_len = 5;
3806              if (c == 0xf8 && d < 0x88)
3807                return 0;
3808            }
3809          else if (c < 0xfe)
3810            {
3811              char_len = 6;
3812              if (c == 0xfc && d < 0x84)
3813                return 0;
3814            }
3815          else
3816            return 0;
3817
3818          if (str_idx + char_len > input->len)
3819            return 0;
3820
3821          for (i = 1; i < char_len; ++i)
3822            {
3823              d = re_string_byte_at (input, str_idx + i);
3824              if (d < 0x80 || d > 0xbf)
3825                return 0;
3826            }
3827          return char_len;

Use LANG=C if you can have invalid multibyte sequences in the input.

Do you think it could be worthwhile then to add a `z' command to zap the
current buffer independent of the presence of invalid multibyte sequences?

Paolo




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to